Permalink-Hash: header specification

Bug #985149 reported by Barry Warsaw
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GNU Mailman
Confirmed
High
Unassigned

Bug Description

Currently, we define the X-Message-ID-Hash as the base32 encoding of the sha1 hash of the Message-ID content (sans angle brackets as defined in RFC 5322). The suggestion is made that List-ID value should be added to this hash so as to be able to distinguish cross-posted messages.

This should be fine, and pretty easy. My only concern is that the header name is now a misnomer. I'm suggesting we change this to Permalink-Hash.

Here is the proposed algorithm for calculating the Permalink-Hash

>>> bare_msgid = msg['Message-ID'][1:-1] # remove the angle brackets
>>> h = sha1(bare_msgid)
>>> list_id = msg['List-ID'][1:-1] # remove angle brackets
>>> h.update(list_id)
>>> permalink_hash = b32encode(h.digest())
>>> msg.add_header('Permalink-Hash', permalink_hash, version='1')

Notes:
 - If the Message-ID or List-ID values do not both start and end with angle brackets, the entire header value should be used (i.e. only strip off bytes 0 and -1 if they are angle brackets)
 - The Permalink-Hash header gets a "version=1" parameter to indicate the version of this spec the header conforms to.
 - RFC 5064 defines the Archived-At header, which mm3 already supports. It is suggested that this header would use the permalink hash as the URI directly to this message in the specified archive.
 - Some people suggest that the List-ID be explicit in the Permalink-Hash value. I would reject this on the grounds that the value should be opaque. If this information is needed, it should be added to the Archived-At header.

Tags: mailman3
Revision history for this message
Richard Wackerbarth (wacky) wrote : Re: [Bug 985149] [NEW] Add List-Post value to permalink hash input

Barry,

I definitely agree that "Now's the time".

I don't understand the proposal. By "added to this hash", do you mean "included in the set of elements that get hashed" or do you mean "appended to the hash value"?

Presumedly, the sole purpose in publishing an algorithm to create the hash is to make it possible for two handlers to independently develop the same hash given only the message. Otherwise, a "secret" method could be used to assign a unique identifier to the message.

In either case, this suggested change renews my argument that the resulting hash should be tagged, visibly, with a "protocol revision designator". Omitting that designation transforms the chosen calculation method into a "secret".

Richard

On Apr 18, 2012, at 1:53 PM, Barry Warsaw wrote:

> Public bug reported:
>
> Currently, we define the X-Message-ID-Hash as the base32 encoding of the
> sha1 hash of the Message-ID content (sans angle brackets as defined in
> RFC 5322). The suggestion is made that List-Post value should be added
> to this hash so as to be able to distinguish cross-posted messages.
>
> This should be fine, and pretty easy. My only concern is that the
> header name is now a misnomer.
>
> I wonder, is it worth coming up with a better header? Now's the time to
> do it since it's likely that there are almost no consumers of this
> standard.
>
> What about `Permalink-Hash` ?
>
> ** Affects: mailman
> Importance: High
> Status: Confirmed
>
>
> ** Tags: mailman3

Revision history for this message
Barry Warsaw (barry) wrote :

On Apr 18, 2012, at 07:22 PM, Richard Wackerbarth wrote:

>I don't understand the proposal. By "added to this hash", do you mean
>"included in the set of elements that get hashed" or do you mean
>"appended to the hash value"?

I mean "append (or prepend, we have to decide ;) to the hash input.

Specifically. Let's say you have this message snippet:

    List-Post: foo.example.com
    Message-ID: <bar>

under the current algorithm is:

    >>> from base64 import b32encode
    >>> from hashlib import sha1
    >>> s = sha1('bar')
    >>> b32encode(s.digest())
    'MLG3OAQP7EQOLKTEFQ6UAZUVBXI7AH2N'

but after the elaboration suggested in this bug would be:

    >>> s = sha1('bar')
    >>> s.update('foo.example.com')
    >>> b32encode(s.digest())
    'P67IMDMX6CRPP3TXX26OMJEOX2DDK6WN'

>Presumedly, the sole purpose in publishing an algorithm to create the
>hash is to make it possible for two handlers to independently develop
>the same hash given only the message. Otherwise, a "secret" method could
>be used to assign a unique identifier to the message.

Exactly.

>In either case, this suggested change renews my argument that the
>resulting hash should be tagged, visibly, with a "protocol revision
>designator". Omitting that designation transforms the chosen calculation
>method into a "secret".

The way to do that is probably to use a parameter on the header, e.g.

    Permalink-Hash: P67IMDMX6CRPP3TXX26OMJEOX2DDK6WN; version=1

Revision history for this message
Barry Warsaw (barry) wrote : Re: Add List-ID value to permalink hash input

s/List-Post/List-ID/

summary: - Add List-Post value to permalink hash input
+ Add List-ID value to permalink hash input
Barry Warsaw (barry)
description: updated
summary: - Add List-ID value to permalink hash input
+ Add Permalink-Hash: header for interoperability with archivers.
summary: - Add Permalink-Hash: header for interoperability with archivers.
+ Permalink-Hash: header specification
Barry Warsaw (barry)
description: updated
Barry Warsaw (barry)
Changed in mailman:
milestone: 3.0.0b2 → none
Revision history for this message
Abhilash Raj (raj-abhilash1) wrote :

This bug has been moved to the new gitlab repo here: https://gitlab.com/mailman/mailman/issues/15

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.