Generate RSS summary in archives

Bug #558014 reported by akuchling
This bug report is a duplicate of:  Bug #317453: Atom or RSS feeds of Mailing lists. Edit Remove
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
GNU Mailman
New
Critical
Unassigned

Bug Description

Here's a first-draft patch. Things that need fixing:

* The generated RSS feed needs to be validated. (It passed the
W3C's RDF validator, but RSS validators still need to be checked.)

* The date should be given in YYYY-MM-DD format, which requires
parsing the .fromdate attribute.

* How do I get the URL for an archived message? The generated RSS
currently just uses the filename, which is wrong. How do I get
at the PUBLIC_ARCHIVE_URL setting?

* Getting the most recent N postings is inefficient; the code loops through all of the archived messages and takes the last N of them.
We could add .last() and .prev() methods to the Database class, but that's more ambitious for 2.1beta than I like. (Would be nice to get this into 2.1final...)

* The list index page should have a LINK element pointing to
the RSS file.

Please make any comments you have, and I'll rework the patch accordingly.

Revision history for this message
akuchling (akuchling) wrote :

Logged In: YES
user_id=11375

Argh; SF choked on the file upload. Attaching the patch again...

Revision history for this message
akuchling (akuchling) wrote :

The file rss.patch was added: Generate an RSS summary for lists

Revision history for this message
bwarsaw (bwarsaw) wrote :

Logged In: YES
user_id=12800

Deferring until post-2.1

Revision history for this message
captainlarry (captainlarry-users) wrote :

Logged In: YES
user_id=147905

Just voting for support here. This is *great* thanks for
the patch and I hope the maintainers include it as soon as
it's appropriate :)

Adam.

Revision history for this message
akuchling (akuchling) wrote :

Logged In: YES
user_id=11375

Updated patch:

* Dates are now rendered as ISO-8601 (date only, not the time of the message)

* By hard-wiring 2002-December, I got the RSS to validate using Mark Pilgrim's validator.

Revision history for this message
akuchling (akuchling) wrote :

The file rss.patch was added: Updated patch

Revision history for this message
uche (uche) wrote :

Logged In: YES
user_id=38966

I'd like to add my vote to this item. This is a fantastic
idea, Andrew. Thanks.

--Uche

Revision history for this message
jmason (jmason-users-sf) wrote :

Logged In: YES
user_id=935

big thumbs up from me too. Much better solution than
http://taint.org/mmrss/ ;)

Revision history for this message
bwarsaw (bwarsaw) wrote :

Logged In: YES
user_id=12800

Andrew, to get the url for the archived message use
mlist.GetBaseArchiveURL(), which knows about private vs.
public archives, the host name and the list name. From
there you should be able to tack on just the part of the
path under "archives/private/listname". See
Mailman/Handlers/Scrubber.py for an example.

Only other minor comment: NUM_ARTICLES can probably go in
Defaults.py.in

Revision history for this message
danbri (danbri) wrote :

Logged In: YES
user_id=7830

Does anyone have a patch to remove the hardwiring of
"2002-December" and get the appropriate date from mailman
somehow?

Revision history for this message
danbri (danbri) wrote :

Logged In: YES
user_id=7830

I thought I'd have a look at this myself, though have modest
knowledge of both Python and MailMan.

In the course of trying to patch the patch, I tried running
the archiver over just the last couple of messages, to speed
things along:
"../../bin/arch -s 4390 rdfweb-dev".
Traceback (most recent call last):
  File "../../bin/arch", line 187, in ?
    main()
  File "../../bin/arch", line 177, in main
    archiver.close()
  File "/usr/local/mailman/Mailman/Archiver/pipermail.py",
line 310, in close
    self.write_TOC()
  File "/usr/local/mailman/Mailman/Archiver/HyperArch.py",
line 1082, in write_TOC
    rss.write(self.RSS())
  File "/usr/local/mailman/Mailman/Archiver/HyperArch.py",
line 769, in RSS
    date, msgid = self.database.dateIndex.first()
AttributeError: HyperDatabase instance has no attribute
'dateIndex'

Not sure what's going on there, but this seemed as good a
place of any to keep note of it.

Investigating...

Revision history for this message
danbri (danbri) wrote :

Logged In: YES
user_id=7830

OK, I've regenerated the patch with some code which works
for me.

http://rdfweb.org/2003/06/mailman-rss/rsspatch

Health warning:

    * I suspect it may fail in conditions when
get_archives() returns
      a list not a string (does this ever happen?).
    * See also problems mentioned below, regenerating partial
       archives seems tricky.

Hope this is useful anyways...

Dan <email address hidden>

Revision history for this message
akuchling (akuchling) wrote :

Logged In: YES
user_id=11375

Here at last is an updated version of the patch that's crawling closer to being complete. There's now a RSS_NUM_ARTICLES setting in Defaults.py, the generated URLs are correct, and I modified the English template to link to the RSS file.

Remaining things: check the generated RSS for correctness; edit all of the other language templates to include the RSS file (I may ask for CVS write access to do that). It would be really nice if the Mailman upgrade script could update existing general list information pages to include the LINK element; any suggestion about how to go about that?

Revision history for this message
akuchling (akuchling) wrote :

Logged In: YES
user_id=11375

Attaching correct version of the patch.

Revision history for this message
akuchling (akuchling) wrote :

The file rss.patch was added: July 2003 version of the patch

Revision history for this message
akuchling (akuchling) wrote :

Logged In: YES
user_id=11375

OK, done!

This patch is now ready to go in: some people have looked at
the RSS and haven't spotted any problems. Barry, can I
please get CVS write access to check this in?

Revision history for this message
bwarsaw (bwarsaw) wrote :

Logged In: YES
user_id=12800

Bumping priority.

Revision history for this message
wookiew (wookiew) wrote :

Logged In: YES
user_id=863445

So far the patch is included (by the way: i hope that
Defaults.py.in in the patch *means* Defaults.py ) and
mailman get a restart. Hopefully i add the two lines in
listinfo.html ( /de/ because we have german speaking lists)
and take a look for the xml file).
After search the whole device (only to be sure) i can say:
There is no file like this. Is another patch need before?
Another setup to make? I cant find any hint here... so i
have to ask. But the idea is great... if it work on my lists
its genious...
regards, Michael
running version 2.1.1

Revision history for this message
codewhacker (codewhacker) wrote :

Logged In: YES
user_id=670974

I'm trying to enrich the RSS output by adding a proper
[description] and a [content:encoded] module, but I am
having the devil's own time locating the raw message text.
Be happy to contribute a patch if you can point me to the
raw content (without the italics markup for quoting).

Thanks!

Revision history for this message
ppsys (ppsys-users) wrote :

Logged In: YES
user_id=75166

The following is based on the July 2003 version of the patch file posted
on sourceforge.

The RSS patch adds the RSS() function as member function of the
HyperArchive class defined in HyperArch.py.

It has been reported that the following statement in RSS():

    date, msgid = self.database.dateIndex.first()

 may generate an AttributeError exception:

    AttributeError: HyperDatabase instance has no attribute 'dateIndex'

The RSS patch appears to make the assumption that whenever the
RSS() function is called from the write_TOC() member function of the
HyperArchive class the __openIndices() function has already been called
on the latest period archive associated with the list, whose TOC page is
being generated by write_TOC(), and that no intervening call to
__closeIndices() has been made.

If the assumption were correct then whenever the RSS() function was
called on a HyperArchive instance, the xxxxxIndices attributes of the
HyperDatabase instance "owned" by the HyperArchive instance would be
pointing to valid instance of DumbBTree.

Unfortunately, this assumption is not correct. In order to do its work,
write_TOC() does not itself need to perform any call to the
__openIndices() function for the list/archive/database whose TOC page is
to be recreated. It just happens that in some circumstances, some of the
code which might call write_TOC may have called the __openIndices()
function at some prior point and left the HyperDatabase instance with a
valid set of xxxxxIndices attributes in place when write_TOC() is called.

For the RSS patch to be work reliably the code in the RSS() function has
to be changed so that it ensures that the conditions it wants prevail when
it executes the statement giving the problem.

The following is an untested code change but if part of the RSS()
function's code definition in HyperArch.py is modified from:

<quote>
        # Get the most recent messages. The only index operation
        # we can count on is traversal by increasing date, so
        # we end up traversing all of the entries and remembering the last
        # N of them. Sigh.
        items = []
        try:
            date, msgid = self.database.dateIndex.first()
            items.append(msgid)
        except KeyError:
            pass

        while 1:
            try:
</quote>

to read:

<quote>
        # Get the most recent messages. The only index operation
        # we can count on is traversal by increasing date, so
        # we end up traversing all of the entries and remembering the last
        # N of them. Sigh.
        items = []
        got_first = 0
        try:
            msgid = self.database.first(self.archives[0], 'date')
            if msgid:
                items.append(msgid)
                got_first = 1
        except KeyError:
            pass

        while got_first and 1:
            try:
</quote>

this should fix the exception problem.

Revision history for this message
ssrjazz (ssrjazz) wrote :

Logged In: YES
user_id=198250

Does anyone have any idea why when I run ~mailman/bin/arch
<listname> it will generate the rss.xml file, but when new
emails come in to said list it doesn't do ANYTHING with the
xml file?

I'd like for the rss feed to update itself every time a new
post comes into a list. Right now it isn't doing that.

Revision history for this message
Kevin Cole (kjcole) wrote :

Still "new" and "critical"? (Just trying to help w/ housecleaning.)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.