VM

issue in vm-reencode-mime-encoded-words with whitespace

Bug #1186772 reported by Anthony Mallet
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
VM
Triaged
Medium
Uday Reddy

Bug Description

It can happen that vm-reencode-mime-encoded-words produces a wrongly encoded string where whitespace is dropped.

For instance, the string "fóó bàr" will typically be encoded to something similar to "=?iso-8859-1?Q?fóó?= ?=iso-8859-1?Q?bàr?=" (I'm not using the real encoding here, for clarity?). The important thing is that the whitespace is _not_ encoded, because it has no specific 'vm-charset property associated to it and the function
vm-reencode-mime-encoded-words skips that.

When this string is further decoded, whitespace is stripped by the following code in
vm-decode-mime-encoded-words, which is otherwise correct:
vm-mime.el:1045
 ;; suppress whitespace between encoded words.
 (and previous-end
  (string-match "\\`[ \t\n]*\\'"
                              (buffer-substring previous-end match-start))
  (setq match-start previous-end))

I fixed the problem by encoding all characters in vm-reencode-mime-encoded-words, even those that don't require encoding.
This may be overkill, but the advantage is that the fix is simple and robust :) See attached patch.

Note: I understand my description of the problem is probably unclear :) I can provide a real e-mail with problematic headers if needed.

Revision history for this message
Anthony Mallet (anthony-mallet-k) wrote :
Revision history for this message
Uday Reddy (reddyuday) wrote :

Is this a duplicate of Bug 1003975?

Revision history for this message
Anthony Mallet (anthony-mallet-k) wrote :

Exactly the same symptom, yes

Revision history for this message
Uday Reddy (reddyuday) wrote :

My reading of RFC 2047 is that only words are supposed to be encoded, not the white space. If white space is included, the strings can get too long and exceed the character limit.

I have no idea why vm-decode-mime-encoded-words is skipping white space. But, the last time I investigated it, I narrowed it down to the fact that VM isn't decoding cached-data. So I will need to fix that first whenever I get a chance, and, then see if the problem remains.

Please feel free to use your patch in the meantime.

Changed in vm:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Uday Reddy (reddyuday)
milestone: none → 8.2.0b2
Revision history for this message
Anthony Mallet (anthony-mallet-k) wrote :

I agree that encoding full sentences like in my patch is borderline. Still, I read in RFC2047 this (end of page 7):
``Only printable and white space character data should be encoded using this scheme.''
which makes me think that whitespace is special...

Actually, I first tried to suppress the whitespace removal in vm-mime.el:1045 (see code excerpt above), and this also fixed the issue regarding the subject line.

However, I also received e-mails where the subject header had whitespace between encoded words (namely newlines) that actually required to be stripped. This matches this excerpt from RFC2047 (page 9) :
``When displaying a particular header field that contains multiple
   'encoded-word's, any 'linear-white-space' that separates a pair of
   adjacent 'encoded-word's is ignored.''
This makes me think that vm-mail.el:1045 and follwing lines is correct.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.