diff -r -u -P mailman-2.0.9-index/INSTALL mailman-2.0.9-htdig/INSTALL
--- mailman-2.0.9-index/INSTALL Thu Nov 16 21:57:37 2000
+++ mailman-2.0.9-htdig/INSTALL Mon Apr 8 18:00:35 2002
@@ -333,6 +333,11 @@
mailman site administrator the ability to adjust these things
when necessary.
+ - If you want to use htdig for searching your mail archives using
+ the Mailman-htdig integration developed by Richard Barrett
+ (r.barrett@ftel.co.uk) then see the instructions in
+ INSTALL.htdig-mm.
+
6. Getting started
- Create a list named `test'. To do so, run the program
diff -r -u -P mailman-2.0.9-index/INSTALL.htdig-mm mailman-2.0.9-htdig/INSTALL.htdig-mm
--- mailman-2.0.9-index/INSTALL.htdig-mm Thu Jan 1 01:00:00 1970
+++ mailman-2.0.9-htdig/INSTALL.htdig-mm Mon Apr 8 18:18:20 2002
@@ -0,0 +1,865 @@
+Installing and Using the Mailman-htdig Integration
+==================================================
+
+This patch:
+
+http://sourceforge.net/tracker/index.php?func=detail&aid=444884&group_id=103&atid=300103
+
+Contents
+========
+
+Prereqisites
+Compatibility
+History
+Introduction
+Installing and Building Mailman with this patch
+What is Installed by the Patch
+Configuration of Mailman-htdig Integration
+ Health Warning on the packet!
+ Starting from Scratch (Again)
+ General
+ Local htdig Configuration
+ Remote htdig Configuration
+ Upgrading an Existing Standard Mailman Installation
+ Changing from local to remote htdig or vice versa
+ Coping with htdig Upgrades
+ Changing the Addressing Scheme of your web_page_url
+Operational Information
+Notes and Warnings
+Contributors
+Appendices
+ Appendix 1 -Technique for htdigging when Mailman's DEFAULT_URL uses the
+ https
+
+Prerequisites
+============
+
+Prior to installing this patch you should also have installed the patch that
+provides enhanced indexing of Mailman archives see:
+
+http://sourceforge.net/tracker/index.php?func=detail&aid=444879&group_id=103&atid=300103
+
+You must have a working installation of htdig with htsearch available via CGI on
+your HTTP server installed on either the machine on which you are running
+Mailman or on another machine which has access to Mailman list archives via NFS
+or some similarly competent network file sharing scheme.
+
+Regardless of how you configure things to provide Mailman's Web UI, if its gives
+normal operation of the /mailman/private CGI script for providing access to
+private list archives, it should also support access to htdig search results via
+the /mailman/htdig CGI script.
+
+Compatibility
+=============
+
+htdig-2.0.9-0.1.patch - Mailman 2.0.9
+
+htdig-2.0.8-0.1.patch - Mailman 2.0.8, 2.0.7, 2.0.6 and probably 2.0.3, 2.0.4
+and 2.0.5
+
+History
+=======
+
+Previous versions - original versions of this patch provided most of the
+features described here with the main exception being support for remote htdig,
+that is running htdig on a different system to Mailman. They were also baked in
+some configuration assumptions, which are now configurable.
+
+htdig-2.0.9-0.1.patch - latest version:
+
+ 1. minor cosmetic changes to get clean patch application to MM 2.0.9
+
+htdig-2.0.8-0.1.patch:
+
+ 1. resolves a problem with the integration of htdig when the web_page_url
+for a list, which is usually the same as DEFAULT_URL from either
+$prefix/Mailman/Defaults.py or $prefix/Mailman/mm_cfg.py, doesn't use the http
+addressing scheme. This arises because htdig will only build indices if the URLs
+for pages use the http addressing scheme. There is a work-around for this
+problem posted in htdig's mail archives - see the copy in Appendix 1 to this
+document.
+
+ 2. This patch revision implements the solution documented in that e-mail. If
+non-http URLs are used by the web_page_url of a list an additional htdig
+configuration file for use by htsearch is generated.
+
+ 3. In all other respects the operation of the Mailman-htdig integration
+remains unchanged. There is no benefit in upgrading to this revised patch unless
+you need to use other than http addressing in your DEFAULT_URL or set other than
+http addressing in the web_page_url configuration of any of your lists.
+
+ 4. If changing to or from a non-http addressing scheme then the per list
+htdig config files of the lists affected and their associated htdig indices must
+be reconstructed. See the section below entitled 'Changing the Addressing Scheme
+of your web_page_url' for details of how to do this.
+
+htdig-2.0.6-0.3.patch:
+
+ 1. adds support for remote htdig, that is: running htdig on a different
+system to Mailman.
+
+ 2. enhances the configurability of the integration. Some of the programmed
+assumptions made in previous versions are now configurable in mm_cfg.py. The
+configuration variables concerned default to the previous fixed values so
+that this version is backwards compatible with earlier versions.
+
+ 3. does some minor cosmetic code changes.
+
+ 4. extends the associated documentation.
+
+Introduction
+============
+
+This integration enables use of the htdig (http://www.htdig.org) search engine
+for searching mail list archives produced by pipermail, Mailman's built-in
+archiver.
+
+You can use htdig without applying these patches to Mailman but you may find it
+awkward to achieve some of the features offered by this patch.
+
+The main features of the patch are:
+
+ 1. per list search facility with a search form on each list's TOC page.
+
+ 2. maintenance of privacy of private archives. The user has to establish
+their credentials via the normal private archive access mechanism before any
+access via htdig is allowed.
+
+ 3. a common base URL for both public and private archive access via htsearch
+results. This means that htdig indices are unaffected by changing an archive
+from private to public and vice versa. All access to archives via htdig is
+controlled by a wrapped CGI script called htdig.py.
+
+ 4. Choice of running htdig on the machine running Mailman (aka local htdig)
+or running htdig on another machine which has access to Mailman's archives
+via NFS or some similarly competent network file sharing scheme (aka remote
+htdig).
+
+ 5. cron activated scripts and crontab entry to run htdig regularly to
+maintain the per list search indices.
+
+ 6. automatic creation, deletion and maintenance of htdig configuration files
+and such. Beyond installing htdig and telling Mailman where it is via mm_cfg
+you do not have to do much other setup.
+
+Installing and Building Mailman with this patch
+==============================================
+
+Create your Mailman build directory in the normal way.
+
+You can apply the patch to either a fresh expansion of the Mailman source
+distribution or the one you used to build a currently working Mailman
+installation.
+
+Execute the following command in the Mailman build directory:
+
+ patch -p1 < htdig-2.0.8-0.1.patch
+
+Follow the configure and make procedures for regular Mailman as given in the
+$build/INSTALL file
+
+Then follow the Mailman-htdig configuration instructions given below.
+
+What is Installed by the Patch
+==============================
+
+The patch amends:
+----------------
+
+$prefix/Mailman/Archiver/HyperArch.py
+
+ the changes in this file set up the per list htdig stuff such as config
+ files and adds the search forms to the list TOC pages.
+
+$build/Mailman/Defaults.py.in
+
+ adds the default configuration variables needed to support the mailman-htdig
+ integration
+
+$build/cron/crontab.in.in
+
+ adds the nightly_htdig cron script to the default crontab
+
+$build/Makefile.in
+$build/cron/Makefile.in
+$build/src/Makefile.in
+$build/bin/Makefile.in
+
+ necessary changes to Makefiles used for installing Mailman
+
+The patch adds:
+--------------
+
+$prefix/cgi-bin/htdig
+$prefix/Mailman/Cgi/htdig.py
+
+ these are a CGI script and its wrapper, which is always on the path of URLs
+ returned from searches of htdig indices. The script provides secure access
+ to such URLs in the same way that the $prefix/cgi-bin/private and
+ $prefix/Mailman/Cgi/private.py. htdig.py ensures private archives are kept
+ private, applying the same criteria for permitting access as private.py,
+ and delivering material from public archives without demanding any
+ authentication.
+
+$prefix/bin/blow_away_htdig
+
+ this is a utility script for removing per list htdig data, e.g. the config
+ file and indices/db files. This is necessary when:
+
+ a. ceasing use of the Mailman-htdig integration
+
+ b. moving from local to remote htdig or vice-versa
+
+ c. upgrading to a version of htdig which has an incompatible
+ index/db file format
+
+ d. changing the addressing scheme (http versus https) in the
+ web_page_url configuration variable of a list
+
+$prefix/cron/nightly_htdig
+$prefix/cron/remote_nightly_htdig
+$prefix/cron/remote_nightly_htdig_noshare
+$prefix/cron/remote_nightly_htdig.pl
+
+ These scripts all do the same thing; they can be installed as a cron task
+ and run regularly to invoke htdig's rundig script to update mailing list
+ search indices. Only one of these scripts is used, the choice of which
+ depending on your system configuration.
+
+ nightly_htdig is used where Mailman and htdig run on the same system.
+
+ the remote_... scripts are used where Mailman and htdig live on different
+ systems. You choose which one suits your needs best:
+
+ remote_nightly_htdig uses the same python files on both systems, that is
+ the same .py and .pyc files are accessed, and it hence depends on
+ compatible bytecode between the Mailman system and htdig system. It also
+ accesses Mailman data files and depends on compatibility of data files
+ contents, for example pickled python values. This should work OK if the
+ same version of python is being run on both systems even where the
+ systems are not heterogeneous, for example one is Sun/Solaris and the
+ other is PC/Linux.
+
+ remote_nightly_htdig_noshare shares no python files between the two
+ systems. While it is still written in python it but acquires information
+ from the file system using directory listings and stat operations.
+
+ remote_nightly_htdig.pl is a rewrite of remote_nightly_htdig_noshare in
+ Perl. It is for use where the htdig system does not have python
+ available on it: in which case, shame on you.
+
+$prefix/cgi-bin/updateTOC
+$prefix/Mailman/Cgi/updateTOC.py
+
+ these are a CGI script and its wrapper, for use where Mailman and htdig
+ live on different systems. The script is a work-around for the problem of
+ using remote_nightly_htdig, remote_nightly_htdig_noshare or
+ remote_nightly_htdig.pl which precludes these scripts from directly updating
+ the TOC page of each archived list. Instead, these scripts call this CGI
+ script to do that for them. This CGI script will not operate when entered as
+ a URL from a browser.
+
+Configuration of Mailman-htdig Integration
+==========================================
+
+Configuration of the Mailman-htdig integration is carried out on the Mailman
+side. While you must have to hand some information about your htdig
+installation, you should not have to tinker with htdig for the integration to
+work.
+
+Most of the configuration of the integration is done by values assigned to
+python variables in either $prefix/Mailman/Defaults.py or
+$prefix/Mailman/mm_cfg.py.
+
+If you opt to run htdig on a different machine or under a different HTTP server
+to the one running the HTTP server which provides Mailman's Web UI you will also
+have to edit whichever of the patch's three htdig related cron scripts you opt
+to run (remote_nightly_htdig, remote_nightly_htdig_noshare, or
+remote_nightly_htdig.pl) to add a small amount of configuration information.
+
+Health Warning on the packet!
+-----------------------------
+
+Be careful when editing configuration information in $prefix/Mailman/mm_cg.py:
+the only Mailman config file you should be editing. Check, double check and then
+recheck before going ahead. If you get either variable names or their values
+wrong a lot of confusion in the operation of both Mailman and htdig can result.
+You (and others supporting you) can spend hours trying to identify problems and
+looking for non-existent bugs as a consequence of such editing errors. Expect to
+find errors in these instructions; compensate for them and tell me when you do
+(r.barrett@ftel.co.uk).
+
+Also do read the htdig documentation, release notes etc. This patch integrates a
+working htdig with htsearch available through CGI. These notes are about Mailman
+and integrating it with that working htdig. It is up to you to sort out the
+htdig end of things.
+
+Starting from Scratch (Again)
+-----------------------------
+
+This is getting ahead of things but some of you may already be asking "What if
+I've already been using an older version of this patch and want to start
+afresh", or "I want to change from local to remote htdig or vice versa"
+
+In these cases your friend will be the $prefix/bin/blow_away_htdig script. It
+removes existing htdig related stuff out of your Mailman installation to the
+extent that it was added by this patch and added to by the normal operation of
+pipermail and nightly_htdig. With that removed and a revised Mailman
+configuration, the patched code will start rebuilding the htdig data.
+
+But before you get carried away with blow_away_htdig, read the rest of these
+notes.
+
+General
+-------
+This patch adds a number of default variables to the file
+$prefix/Mailman/Defaults.py that affect operation of the Mailman-htdig
+integration. These are in addition to the standard Mailman defaults in that
+file. If, in the light of what is said below, you decide any of these are
+incorrect, you can override them in $prefix/Mailman/mm_cfg.py [NOT IN
+Defaults.py! See the comments in Defaults.py for details].
+
+By default the Mailman-htdig integration is NOT ENABLED by the installation of
+this patch; a default variable in Defaults.py turns off the operation of the
+integration. You have to actively override that default in mm_cfg.py to turn on
+operation of the integration.
+
+Once a list is created, changing most of these variables will have either no
+effect or a bad effect. You will need to run $prefix/bin/blow_away_htdig script
+and/or $prefix/bin/arch to rebuild the archive pages if you make significant
+changes to the Mailman-htdig integration configuration variables.
+
+The install process will not overwrite an existing mm_cfg.py file so you can
+freely make changes to this file. If you are re-installing a later version of
+this patch you may have to change what is already configured in the existing
+file and, if necessary, add extra configuration variables to it.
+
+Most of the Mailman-htdig control variables default to sensible values which you
+will not need to change, especially if you are using local htdig. The semantics
+of most variables apply to both local and remote htdig operation but with some
+the values assigned will depend on whether htdig is viewing things from the same
+or a remote machine.
+
+The first two variables control what is indexed by htdig. The values assigned
+are both embedded in the HTML generated by pipermail in the list archives and
+added. Changing the values of these variables will mean that all previously
+generated HTML pages in list archives will be out of date and you will probably
+want to rebuild existing archives using $prefix/bin/arch:
+
+ARCHIVE_INDEXING_ENABLE
+
+ defines a string telling htdig that it should look at the following material
+ when building it indices.
+
+ Default: ARCHIVE_INDEXING_ENABLE = ''
+
+ARCHIVE_INDEXING_DISABLE
+
+ defines a string telling htdig that it not should not look at the following
+ material when building it indices.
+
+ Default: ARCHIVE_INDEXING_DISABLE = ''
+
+USE_HTDIG - Semantics 0 - don't use integrated htdig, 1 - use it
+
+ turns Mailman-htdig integration on or off.
+
+ Defaults: USE_HTDIG = 0
+
+ Notes:
+
+ 1. when USE_HTDIG is turned on the patched code in Mailman will start adding
+ htdig stuff for any archiving-enabled mail lists as new posts for each
+ list are handled by Mailman. Until a new post is made after enabling with
+ USE_HTDIG an existing mail list's archive will not be htdig searchable.
+ When the new post is handled:
+
+ a. the list's personalised htdig config file is created
+
+ b. necessary links to the htdig config file are created
+
+ c. a search form is added to the TOC page for the list
+
+ Even with this done, htdig searches only become available when htdig
+ indices are constructed. This is done when one or other of the patch's
+ htdig related cron scripts are run (nightly_htdig, remote_nightly_htdig,
+ remote_nightly_htdig_noshare, or remote_nightly_htdig.pl, depending on
+ how you configure your system). These can be run from the command line
+ ahead of their scheduled cron time to get htdig searches operational.
+
+ 2. Turning USE_HTDIG off will not remove htdig indices or search forms from
+ existing archive-enabled lists. It will however stop htdig features from
+ being added to newly created lists. If you want to eliminate htdig from
+ your existing lists then use the $prefix/bin/blow_away_htdig script.
+
+HTDIG_ARCHIVE_URL
+
+ this is the URL path that equates to the wrapper $prefix/cgi-bin/htdig which
+ controls access to the $prefix/Mailman/Cgi/htdig.py script.
+
+ Default: HTDIG_ARCHIVE_URL = '/mailman/htdig'
+
+ It is highly unlikely that you will want to change from the default value
+ unless you are also changing other variables such as PRIVATE_ARCHIVE_URL
+ because of some non-standard installation decisions on your part.
+
+HTDIG_SEARCH_URL
+
+ this is the URL of the htsearch CGI program part of the htdig package.
+
+ Default: HTDIG_SEARCH_URL = '/cgi-bin/htsearch'
+
+ The default assumes a single HTTP server providing access to htdig and to
+ Mailman's web UI are on the Mailman machine and htsearch has been installed
+ in the HTTP server's cgi-bin directory. This value will depend on your htdig
+ installation decisions and HTTP server configuration files (typically
+ /etc/httpd/httpd.conf on a late model Apache installation) i.e the
+ ScriptAlias through which the htsearch CGI program is reached.
+
+HTDIG_FILES_URL
+
+ this is the URL of the directory containing various HTML and Graphics files
+ installed by htdig; files such as buttonr.gif, buttonl.gif and
+ button1-10.gif. The URL must end with a '/'.
+
+ Default: HTDIG_FILES_URL = '/htdig/'
+
+ The default assumes the HTTP servers providing access to htdig and to
+ Mailman's web UI are on the same machine and a symbolic link called 'htdig'
+ has been put into your HTTP server's top level HTML directory which points
+ to the directory your htdig install has put the actual files into; this link
+ is often to /usr/share/htdig. This value will depend on your htdig
+ installation decisions and HTTP server's configuration files (typically
+ /etc/httpd/httpd.conf on a late model Apache installation) i.e the Alias
+ through which the link to the htdig files are reached.
+
+HTDIG_CONF_LINK_DIR
+
+ this is the name of a directory in which links to list specific htdig config
+ files are placed.
+
+ Default: HTDIG_CONF_LINK_DIR = os.path.join(VAR_PREFIX, 'archives', 'htdig')
+
+ The VAR_PREFIX of the default is resolved to an actual file system path when
+ when Mailman's 'make install' is run. The 'os.path.join' creates a full file
+ system path by gluing together the three pieces when Mailman is run. This
+ definition puts the directory alongside the default PUBLIC_ARCHIVE_FILE_DIR
+ and PRIVATE_ARCHIVE_FILE_DIR. Unless you are changing the value of these
+ variables you probably do not want to change HTDIG_CONF_LINK_DIR.
+
+HTDIG_RUNDIG_PATH
+
+ this is the path in you file system to the rundig shell script that is
+ installed as part of htdig. This tells one or other of the patch's htdig
+ related cron scripts (nightly_htdig and remote_nightly_htdig) where to find
+ rundig in order that they can execute it.
+
+ Default: HTDIG_RUNDIG_PATH = '/usr/local/bin/rundig'
+
+HTDIG_MAILMAN_LINK
+
+ the value of this is the name of a symbolic link you must create in the
+ directory where htdig expects to find its configuration files. The target of
+ this link is the directory whose path is the value of HTDIG_CONF_LINK_DIR.
+ The value of this variable is embedded in the per list search forms in each
+ list's TOC page generated by the patched code, where it tells htsearch where
+ to find the list's htdig config file.
+
+ Default: HTDIG_MAILMAN_LINK = 'htdig-mailman'
+
+REMOTE_HTDIG - Semantics 0 - htdig runs on local machine, 1 -on remote machine
+
+ says whether htdig is run on the same machine as Mailman or on another
+ machine.
+
+ Default: REMOTE_HTDIG = 0
+
+REMOTE_PRIVATE_ARCHIVE_FILE_DIR
+
+ only relevant if REMOTE_HTDIG = 1. It is the file system path to the
+ directory in which Mailman stores private archives, as seen by the machine
+ running htdig.
+
+ Default: REMOTE_PRIVATE_ARCHIVE_FILE_DIR = = os.path.join(VAR_PREFIX,
+ 'archives', 'private')
+
+ The VAR_PREFIX of the default is resolved to an actual file system path when
+ when Mailman's 'make install' is run. The 'os.path.join' creates a full file
+ system path by gluing together the three pieces when Mailman is run. If you
+ assign a value to this in mm_cfg.pfg, just put the relevant explicit file
+ system path in.
+
+Local htdig Configuration
+-------------------------
+
+This configuration is for when you are running Mailman, htdig, the HTTP server
+used to provide Mailman's web UI and htdig's htsearch CGI script, on the same
+machine.
+
+You will need to:
+
+ 1. Set up a symbolic link in the directory where htdig expects to find its
+ configuration files; this depends on how you configured and installed
+ htdig but it is usually the directory containing htdig's default
+ htdig.conf file. The target of this link is the directory whose path is
+ assigned as the value of HTDIG_CONF_LINK_DIR. The name of the link must
+ be same as the value you assign to HTDIG_MAILMAN_LINK. For example, use
+ the command:
+
+ ln -s /home/mailman/archives/htdig /etc/htdig-mailman
+
+ 2. If different to the default value, add the definition of
+ HTDIG_MAILMAN_LINK to file $prefix/Mailman/mm_cfg.py
+
+ 3. If different to the default value, add the definition of
+ HTDIG_RUNDIG_PATH to file $prefix/Mailman/mm_cfg.py.
+
+ 4. Add the definition of USE_HTDIG with the value 1 to
+ $prefix/Mailman/mm_cfg.py.
+
+ USE_HTDIG = 1
+
+
+If necessary you can override the values of any of the other configuration
+variables in file $prefix/Mailman/mm_cfg.py. In particular you might need to
+change the following URL variables from their defaults: HTDIG_SEARCH_URL and
+HTDIG_FILES_URL.
+
+These URLs can be just the path i.e. absolute URL on the same server as that
+which serves Mailman's Web UI, or a full URL identifying the protocol (http),
+server, server port and path, for example
+http://mailer.your.com:8080/cgi-bin/htdig/htsearch.
+
+Remote htdig Configuration
+--------------------------
+
+This configuration is for when you are running htdig and an HTTP server
+providing access to htsearch on a different machine to that running Mailman and
+the HTTP server used to provide Mailman's web interface.
+
+For this configuration to work, htdig's programs, both those run from command
+lines such as rundig and those run via CGI such as htsearch, must be able to see
+Mailman archives through NFS. In the examples below we'll assume that
+/mnt/mailman-archives on the htdig machine maps to $prefix/mailman/archives on
+the Mailman machine.
+
+You should also arrange for he mailman UID and its GID to be common to both
+machines. Remember that when rundig is called on the htdig machine to produce
+search indices for each list it will be trying to write those files via NFS in
+Mailman's archive area and will thus need to run with an appropriate identity
+and permissions.
+
+The differences between the local and remote configuration are:
+
+ 1. configuration values telling htdig where to find files are as viewed from
+ the remote machine.
+
+ 2. configuration values giving URLs that refer to htdiggy things have to be
+ as viewed from the Mailman machine.
+
+You will need to:
+
+ 1. Set up a symbolic link in the directory where htdig expects to find its
+ configuration files; this depends on how you configured and installed
+ htdig but it is usually the directory containing htdig's default
+ htdig.conf file. The target of this link is the directory whose path is
+ assigned as the value of HTDIG_CONF_LINK_DIR as seen from the remote
+ machine running htdig. The name of the link must be same as the value you
+ assign to HTDIG_MAILMAN_LINK. For example, use the command:
+
+ ln -s /mnt/mailman-archives/htdig /etc/htdig-mailman
+
+ 2. Add the definition of HTDIG_MAILMAN_LINK to file
+ $prefix/Mailman/mm_cfg.py. For example:
+
+ HTDIG_MAILMAN_LINK = 'htdig-mailman'
+
+ 3. Add the definition of HTDIG_RUNDIG_PATH to file
+ $prefix/Mailman/mm_cfg.py. This is path to rundig on the remote machine
+ running htdig. For example:
+
+ HTDIG_RUNDIG_PATH = '/usr/local/bin/rundig'
+
+ 4. Add the definition of HTDIG_SEARCH_URL to file $prefix/Mailman/mm_cfg.py.
+ This must be a full URL referring to the htsearch CGI program on the
+ remote htdig machine, as seen from the Mailman local machine. For
+ example:
+
+ HTDIG_SEARCH_URL = 'http://htdiggy.your.com/cgi-bin/htsearch'
+
+ 5. Add the definition of HTDIG_FILES_URL to file $prefix/Mailman/mm_cfg.py.
+ This must be a full URL referring to the directory containing htdig files
+ on the remote htdig machine as seen from the Mailman local machine. This
+ URL must end with a '/'. For example:
+
+ HTDIG_FILES_URL = 'http://htdiggy.your.com/htdig/'
+
+ 6. Add the definition of REMOTE_PRIVATE_ARCHIVE_FILE_DIR to
+ $prefix/Mailman/mm_cfg.py. This must be the absolute file system path to
+ the directory in which Mailman stores private archives as seen by the
+ machine running htdig. For example:
+
+ REMOTE_PRIVATE_ARCHIVE_FILE_DIR = '/mnt/mailman-archives/private'
+
+ 7. Add the definition of USE_HTDIG with the value 1 to
+ $prefix/Mailman/mm_cfg.py.
+
+ USE_HTDIG = 1
+
+ 8. Add the definition of REMOTE_HTDIG with the value 1 to
+ $prefix/Mailman/mm_cfg.py.
+
+ REMOTE_HTDIG = 1
+
+You have to choose one of the three remote_nightly_htdig scripts found in
+$prefix/cron - remote_nightly_htdig, remote_nightly_htdig_noshare and
+remote_nightly_htdig.pl - and transfer it to the htdig machine. See above under
+heading "What is Installed by the Patch/What the patch adds" for an explanation
+of the differences between these scripts, which all do the same basic job. You
+should add the script to the crontab for the mailman UID on the htdig machine.
+But first you need to edit the selected script to add some configuration
+information. What has to be added depends on which script you opt to use. In
+each case the variables concerned are declared near the top of the script and
+you just have to enter the appropriate values:
+
+ remote_nightly_htdig
+ you only need to set the value of the python variable MAILMAN_PATH to be
+ the directory $prefix as seen from the htdig machine. The whole Mailman
+ installation must be accessible via NFS in order to use this script.
+
+ remote_nightly_htdig_noshare
+ you need to copy the values for the following configuration
+ variables from either $prefix/Mailman/mm_cfg.py or
+ $prefix/Mailman/Defaults.py to the script: DEFAULT_URL,
+ REMOTE_PRIVATE_ARCHIVE_FILE_DIR, HTDIG_RUNDIG_PATH. The variables
+ declared in remote_nightly_htdig_noshare use the same names. This script
+ only requires that the archives directory of the Mailman installation be
+ accessible via NFS.
+
+ Note: DEFAULT_URL is not a Mailman-htdig integration specific
+ configuration variable. In most installations DEFAULT_URL is setup
+ automatically by the 'make install' in $prefix/Mailman/Defaults.py and
+ not usually overridden in $prefix/Mailman/mm_cfg.py. You should find it
+ defined near the top of Defaults.py.
+
+ remote_nightly_htdig.pl
+ you need to copy the values for the following configuration
+ variables from either $prefix/Mailman/mm_cfg.py or
+ $prefix/Mailman/Defaults.py to the script: DEFAULT_URL,
+ REMOTE_PRIVATE_ARCHIVE_FILE_DIR, HTDIG_RUNDIG_PATH. Being a Perl script,
+ the variables in remote_nightly_htdig.pl use the same names but prefixed
+ with the '$' character. This script only requires that the archives
+ directory of the Mailman installation be accessible via NFS.
+
+ Note 1: DEFAULT_URL is not a Mailman-htdig integration specific
+ configuration variable. In most installations DEFAULT_URL is setup
+ automatically by the 'make install' in $prefix/Mailman/Defaults.py and
+ not usually overridden in $prefix/Mailman/mm_cfg.py. You should find it
+ defined near the top of Defaults.py
+
+ Note 2: You may need to change the '#! /usr/bin/env perl' on the first
+ line of this script if that doesn't find your Perl executable. You may
+ also need to verify the Perl packages used by this script are installed
+ on your system.
+
+As with the nightly_htdig script when running with local htdig, these scripts
+can be run from the command line using the mailman UID in order to get htdig to
+construct an initial set of indices.
+
+Upgrading an Existing Standard Mailman Installation
+---------------------------------------------------
+
+You will want to suspend operation of Mailman while doing the upgrade. Consider
+doing a shutdown of the MTA delivering mail to Mailman and removing Mailman's
+crontab.
+
+Configure and install as described above.
+
+Restart Mailman's crontab and restart your MTA's delivery to Mailman.
+
+If your installation already has archives:
+
+ 1. Send a message to each of your archive-enabled lists. This will stimulate
+ the setup of the new per list htdig config files in the Mailman archives.
+
+ 2. Consider rebuilding your existing archives with $prefix/bin/arch. This
+ will embed the ARCHIVE_INDEXING_ENABLE and ARCHIVE_INDEXING_DISABLE in
+ the regenerated archive pages and, after nightly_htdig has been run, give
+ improved search results.
+
+ 3. Run the nightly_htdig script from the command line to generate a new set
+ of per list htdig search indices.
+
+Changing from local to remote htdig or vice versa
+-------------------------------------------------
+
+You will want to suspend operation of Mailman while making this change. Consider
+doing a shutdown of the MTA delivering mail to Mailman and removing Mailman's
+crontab.
+
+Run the $prefix/bin/blow_away_htdig script to remove all existing per list htdig
+config files and htdig indices/db files.
+
+Configure per the instructions above for the local or remote target.
+
+Restart Mailman's crontab and restart your MTA's delivery to Mailman.
+
+Send a message to each of your archive-enabled lists. This will stimulate the
+set up of the new per list htdig config files in Mailman archives.
+
+Run the nightly_htdig script from the command line to generate a new set of per
+list htdig search indices.
+
+Coping with htdig Upgrades
+--------------------------
+
+If you change the version of htdig you run, you may find that the indices built
+with the ealier version are not compatible with the newer version of htdig's
+programs. In that case do the following:
+
+ 1. You will want to suspend operation of Mailman while making this change.
+ Consider doing a shutdown of the MTA delivering mail to Mailman and
+ removing Mailman's crontab.
+
+ 2. Run the $prefix/bin/blow_away_htdig script with the -i flag to remove all
+ existing per list htdig indices/db files.
+
+ 3. Restart Mailman's crontab and restart your MTA's delivery to Mailman.
+
+ 4. Run the nightly_htdig script from the command line to generate new sets
+ of per list htdig search indices.
+
+Changing the Addressing Scheme of your web_page_url
+---------------------------------------------------
+
+If you change the addressing scheme of the web_page_url for a list to or from
+http then you will need to rebuild the list's htdig configuration file(s) and
+the related htdig indices. Do the following:
+
+ 1. You may want to suspend operation of Mailman while making this change.
+ Consider doing a shutdown of the MTA delivering mail to Mailman and
+ removing Mailman's crontab.
+
+ 2. Run the $prefix/bin/blow_away_htdig script to remove all existing per
+ list htdig material for the list(s) concerned.
+
+ 3. Restart Mailman's crontab and restart your MTA's delivery to Mailman.
+
+ 4. Send a message to each affected list to provoke reconstruction of the
+ list's htdig config file(s).
+
+ 5. Run the nightly_htdig script from the command line to generate new sets
+ of per list htdig search indices.
+
+
+Operational Information
+=======================
+
+If you have just turned USE_HTDIG on or just used $prefix/bin/blow_away_htdig
+(without the -i flag) there will initially be no per list htdig information
+saved in the archives.
+
+When the first post to each archive-enabled list is archived by pipermail, the
+per list htdig config file will be constructed and some directories and links
+added to your Mailman archive directories. The htdig search form will be added
+to list's TOC page.
+
+However, until one of the nightly_htdig scripts is run no htdig indices will be
+constructed. You can either wait for the script to run as a cron job or run it
+(while using the mailman UID) from the command line.
+
+Notes and Warnings
+==================
+
+Redhat 7.1 and 7.2 installations:
+
+ If you install htdig from the htdig-3.2.0 binary rpm of RH7.1/2 Binary CD 1
+ of 2 you also have to install the htdig-web-3.2.0 binary rpm. This may be
+ from RH 7.1/2 Binary CD 2 of 2 or CD 1 of 2 depending on whether you are
+ using actual CDs or downloaded CD images.
+
+Apache/htdig issues
+
+ The htsearch CGI script part of htdig and some associated HTML and graphics
+ file must be accessible via you web server and the Mailman configuration
+ variables HTDIG_SEARCH_URL and HTDIG_FILES_URL setup accordingly. Depending
+ on how you install htdig and Apache you may need to add Alias and/or
+ ScriptAlias directives to you Apache configuration file to make the htdig
+ components accessible. Check the Apache and htdig documentation.
+
+Contributors
+============
+
+Original author and maintainer: Richard Barrett - r.barrett@ftel.co.uk
+
+Past bug fixes: Nigel Metheringham
+ To search this archive fill in the following form: +
++
+ ++ Note:The archive search index was last rebuilt at + %(lastrun)s. Any postings after that will not be found by + a search. Index rebuild is usally done once every 24 hours for + this list. You can use out the "View by date" link below for the + most recent postings. +
+''' + +htdig_conf_template = '''\ +# +# Taken from the example config file for ht://Dig, with most comments excised +# See the htdig.conf from the distribution you have installed +# +database_dir: %(databases)s +start_url: %(starturl)s +limit_urls_to: ${start_url} +local_urls: %(urlpath)s=%(filepath)s +local_urls_only: true +url_part_aliases: %(url_part_aliases)s +noindex_end: %(indexing_enable)s +noindex_start: %(indexing_disable)s +exclude_urls: /cgi-bin/ .cgi +bad_extensions: .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif \ + .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi +maintainer: %(maintainer)s +max_head_length: 10000 +max_doc_size: 200000 +no_excerpt_show_top: true +search_algorithm: exact:1 synonyms:0.5 endings:0.1 +template_map: Long long ${common_dir}/long.html \ + Short short ${common_dir}/short.html +template_name: short +next_page_text: +no_next_page_text: +prev_page_text: +no_prev_page_text: +page_number_text: '' \ + '' \ + '' \ + '' \ + '' \ + '' \ + '' \ + '' \ + '' \ + '' +no_page_number_text: '' \ + '' \ + '' \ + '' \ + '' \ + '' \ + '' \ + '' \ + '' \ + '' +''' + class HyperArchive(pipermail.T): __super_init = pipermail.T.__init__ @@ -597,6 +693,9 @@ self._lock_file = None self._charsets = {} self.charset = None + + if mm_cfg.USE_HTDIG: + self.setup_htdig() if hasattr(self.maillist,'archive_volume_frequency'): if self.maillist.archive_volume_frequency == 0: @@ -613,6 +712,7 @@ html_hdr_tmpl = index_header_template html_foot_tmpl = index_footer_template html_TOC_tmpl = TOC_template + html_TOC_htsearch_tmpl = TOC_htsearch_template TOC_entry_tmpl = TOC_entry_template arch_listing_start = arch_listing_start arch_listing_end = arch_listing_end @@ -667,6 +767,7 @@ "listinfo": self.maillist.GetScriptURL('listinfo', absolute=1), "fullarch": '../%s.mbox/%s.mbox' % (listname, listname), "size": sizeof(mbox), + "htsearch": '', "indexing_enable": mm_cfg.ARCHIVE_INDEXING_ENABLE, "indexing_disable": mm_cfg.ARCHIVE_INDEXING_DISABLE, } @@ -679,6 +780,25 @@ d["noarchive_msg"] = "" d["archive_listing_start"] = self.arch_listing_start d["archive_listing_end"] = self.arch_listing_end + if mm_cfg.USE_HTDIG: + list_htdig_dir = os.path.join(self.maillist.archive_dir(), 'htdig') + rundig_file = os.path.join(list_htdig_dir, 'rundig_last_run') + if os.path.exists(rundig_file) and os.path.isfile(rundig_file): + last_rundig = time.localtime(os.stat(rundig_file)[ST_MTIME]) + lastrun = time.strftime("%A, %d %b %Y %H:%M:%S %Z", last_rundig) + else: + lastrun = '[has yet to be built for this new list]' + h = {"listname": self.maillist.internal_name(), + "htconfdir": mm_cfg.HTDIG_MAILMAN_LINK, + "htsearchcgi": mm_cfg.HTDIG_SEARCH_URL, + "lastrun": lastrun, + "htsearchconf": '', + } + conf_name_search = self.maillist.internal_name() + '.htsearch.conf' + conf_file_search = os.path.join(list_htdig_dir, conf_name_search) + if os.path.exists(conf_file_search): + h['htsearchconf'] = '.htsearch' + d["htsearch"] = self.html_TOC_htsearch_tmpl % h accum = [] for a in self.archives: accum.append(self.html_TOC_entry(a)) @@ -686,6 +806,108 @@ if not d.has_key("encoding"): d["encoding"] = "" return self.html_TOC_tmpl % d + + def remove_htdig(self, indices_only): + list_htdig_dir = os.path.join(self.maillist.archive_dir(), 'htdig') + if not os.path.exists(list_htdig_dir): + return + conf_name_dig = self.maillist.internal_name() + '.conf' + conf_file_dig = os.path.join(list_htdig_dir, conf_name_dig) + conf_name_search = self.maillist.internal_name() + '.htsearch.conf' + conf_file_search = os.path.join(list_htdig_dir, conf_name_search) + dual_conf_files = None + if os.path.exists(conf_file_search): + dual_conf_files = 1 + if indices_only: + cfd = open(conf_file_dig, 'r') + conf_data_dig = cfd.readlines() + cfd.close() + if dual_conf_files: + cfd = open(conf_file_search, 'r') + conf_data_search = cfd.readlines() + cfd.close() + os.system('rm -rf ' + list_htdig_dir + '/*') + cfd = open(conf_file_dig, 'w') + cfd.writelines(conf_data_dig) + cfd.close() + if dual_conf_files: + cfd = open(conf_file_search, 'w') + cfd.writelines(conf_data_search) + cfd.close() + else: + os.system('rm -rf ' + list_htdig_dir) + conf_file_link_dig = os.path.join(mm_cfg.HTDIG_CONF_LINK_DIR, conf_name_dig) + os.unlink(conf_file_link_dig) + if dual_conf_files: + conf_file_link_search = os.path.join(mm_cfg.HTDIG_CONF_LINK_DIR, conf_name_search) + os.unlink(conf_file_link_search) + + def setup_htdig(self): + listname = self.maillist.internal_name() + # we want to make a directory to put the mail list's htdig stuff in + list_htdig_dir = os.path.join(self.maillist.archive_dir(), 'htdig') + # but we bug out if this has already been done + if os.path.exists(list_htdig_dir): + return + mkdir(list_htdig_dir) + # assemble the mapping for characterising the htdig config + htdigfiles = mm_cfg.HTDIG_FILES_URL + if mm_cfg.HTDIG_FILES_URL[-1] == '/': + htdigfile = htdigfiles[:-1] + d = {'databases': list_htdig_dir, + "filepath": self.maillist.archive_dir() + '/', + "maintainer": mm_cfg.MAILMAN_OWNER, + "indexing_enable": mm_cfg.ARCHIVE_INDEXING_ENABLE, + "indexing_disable": mm_cfg.ARCHIVE_INDEXING_DISABLE, + "htdig_url": htdigfiles, + } + # we need to changes paths to be relative to file system of + # remote machine if we are not running htdig on mailman machine + if mm_cfg.REMOTE_HTDIG: + d['filepath'] = os.path.join(mm_cfg.REMOTE_PRIVATE_ARCHIVE_FILE_DIR, + listname + '/') + d['databases'] = os.path.join(d['filepath'], 'htdig') + # now the URL through which htdig access to the pipermail data will go + starturl_dig = starturl_search = self.maillist.GetScriptURL('htdig') + '/' + # we need to know if the addressing scheme for the URL as htdig cannot + # cope with other than http (https for instance) when building indices + # we'll need different conf files for htdig and htsearch in that case + dual_conf_files = None + urlbits = urlparse.urlparse(starturl_dig) + if urlbits[0] != 'http': + urlbits = ('http',) + urlbits[1:] + starturl_dig = urlparse.urlunparse(urlbits) + dual_conf_files = 1 + # create htdig config files. we may need one for digging and another for + # searching if the addressing scheme is https these config files are slightly + # different we'll put the files in the directory we just created above + conf_name_dig = listname + '.conf' + d['url_part_aliases'] = starturl_dig + " *mm-htdig*" + d['starturl'] = starturl_dig + d['urlpath'] = starturl_dig + conf_file_dig = os.path.join(list_htdig_dir, conf_name_dig) + fd = open(conf_file_dig, 'w') + fd.write(htdig_conf_template % d) + fd.close() + # we need symlinks so that htdig will be able to find the config files + conf_file_link_dig = os.path.join(mm_cfg.HTDIG_CONF_LINK_DIR, conf_name_dig) + if os.path.exists(conf_file_link_dig) and os.path.islink(conf_file_link_dig): + os.unlink(conf_file_link_dig) + os.symlink(conf_file_dig, conf_file_link_dig) + # make the second conf file and link to it for htsearch if necessary + if dual_conf_files: + conf_name_search = listname + '.htsearch.conf' + d['url_part_aliases'] = starturl_search + " *mm-htdig*" + d['starturl'] = starturl_search + d['urlpath'] = starturl_search + conf_file_search = os.path.join(list_htdig_dir, conf_name_search) + fd = open(conf_file_search, 'w') + fd.write(htdig_conf_template % d) + fd.close() + conf_file_link_search = os.path.join(mm_cfg.HTDIG_CONF_LINK_DIR, conf_name_search) + if os.path.exists(conf_file_link_search) and os.path.islink(conf_file_link_search): + os.unlink(conf_file_link_search) + os.symlink(conf_file_search, conf_file_link_search) def html_TOC_entry(self, arch): # Check to see if the archive is gzip'd or not Only in mailman-2.0.9-index/Mailman/Archiver: indexing-2.0.9-0.1.patch diff -r -u -P mailman-2.0.9-index/Mailman/Cgi/htdig.py mailman-2.0.9-htdig/Mailman/Cgi/htdig.py --- mailman-2.0.9-index/Mailman/Cgi/htdig.py Thu Jan 1 01:00:00 1970 +++ mailman-2.0.9-htdig/Mailman/Cgi/htdig.py Mon Apr 8 18:00:35 2002 @@ -0,0 +1,181 @@ +# Copyright (C) 1998,1999,2000 by the Free Software Foundation, Inc. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License +# as published by the Free Software Foundation; either version 2 +# of the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + +"""Provide an authentication wrapper around archives accessed via +returned results from htdig's htsearch. Access via htdig requires the +user's request present a valid cookie authorizing access to the +list's archives for private archives. +This cookie must be obtained by the same process as the user must +adopt for accessing the archive directly rather than via +htsearch results. Indeed the user should only be able to reach the +search facility, which appears on the list archives front page, if +they have been through the authentication process. However, this code +prevents someone hand fettling a URL on the browser or using one +given to them by an authorised user, which might compromise the +list's privacy. +""" + +# this code was derived from the private.py cgi script + +import sys, os, string, cgi +from Mailman import Utils, MailList, Errors +from Mailman.htmlformat import * +from Mailman.Logging.Utils import LogStdErr +from Mailman import mm_cfg +from Mailman.Logging.Syslog import syslog + +header_html = """Content-type: text/html + + + ++ If you want to make another attempt to access a list archive then go via the + list users information page. +
++ If this problem persists then please e-mail the following information to the +%s: +
++ %s + %s ++
+ The requested document cannot be found. + %s
+""" + +data_error_html = """+ The requested document cannot be read. +
+""" + +auth_error_html = """+ You are not authorised to access the URL referenced. +
++ This access failure may be due to: +
+