VM

issue in vm-reencode-mime-encoded-words with whitespace

Bug #1186772 reported by Anthony Mallet on 2013-06-02

6

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	VM	Triaged	Medium	Uday Reddy	VM 8.3.2a

Bug Description

It can happen that vm-reencode-mime-encoded-words produces a wrongly encoded string where whitespace is dropped.

For instance, the string "fóó bàr" will typically be encoded to something similar to "=?iso-8859-1?Q?fóó?= ?=iso-8859-1?Q?bàr?=" (I'm not using the real encoding here, for clarity?). The important thing is that the whitespace is _not_ encoded, because it has no specific 'vm-charset property associated to it and the function
vm-reencode-mime-encoded-words skips that.

When this string is further decoded, whitespace is stripped by the following code in
vm-decode-mime-encoded-words, which is otherwise correct:
vm-mime.el:1045
;; suppress whitespace between encoded words.
(and previous-end
  (string-match "\\`[ \t\n]*\\'"
                              (buffer-substring previous-end match-start))
  (setq match-start previous-end))

I fixed the problem by encoding all characters in vm-reencode-mime-encoded-words, even those that don't require encoding.
This may be overkill, but the advantage is that the fix is simple and robust :) See attached patch.

Note: I understand my description of the problem is probably unclear :) I can provide a real e-mail with problematic headers if needed.

Revision history for this message

Anthony Mallet (anthony-mallet-k) wrote on 2013-06-02:

#1

Fix re-encoding of whitespace Edit (473 bytes, text/plain)

Revision history for this message

Uday Reddy (reddyuday) wrote on 2013-06-02:

#2

Is this a duplicate of Bug 1003975?

Revision history for this message

Anthony Mallet (anthony-mallet-k) wrote on 2013-06-02:

#3

Exactly the same symptom, yes

Revision history for this message

Uday Reddy (reddyuday) wrote on 2013-06-02:

#4

My reading of RFC 2047 is that only words are supposed to be encoded, not the white space. If white space is included, the strings can get too long and exceed the character limit.

I have no idea why vm-decode-mime-encoded-words is skipping white space. But, the last time I investigated it, I narrowed it down to the fact that VM isn't decoding cached-data. So I will need to fix that first whenever I get a chance, and, then see if the problem remains.

Please feel free to use your patch in the meantime.

Changed in vm:
status:	New → Triaged
importance:	Undecided → Medium
assignee:	nobody → Uday Reddy (reddyuday)
milestone:	none → 8.2.0b2

Revision history for this message

Anthony Mallet (anthony-mallet-k) wrote on 2013-06-02:

#5

I agree that encoding full sentences like in my patch is borderline. Still, I read in RFC2047 this (end of page 7):
``Only printable and white space character data should be encoded using this scheme.''
which makes me think that whitespace is special...

Actually, I first tried to suppress the whitespace removal in vm-mime.el:1045 (see code excerpt above), and this also fixed the issue regarding the subject line.

However, I also received e-mails where the subject header had whitespace between encoded words (namely newlines) that actually required to be stripped. This matches this excerpt from RFC2047 (page 9) :
``When displaying a particular header field that contains multiple
'encoded-word's, any 'linear-white-space' that separates a pair of
adjacent 'encoded-word's is ignored.''
This makes me think that vm-mail.el:1045 and follwing lines is correct.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Patches

Fix re-encoding of whitespace Edit

Add patch

Remote bug watches

Bug watches keep track of this bug in other bug trackers.