Comment 3 for bug 888572

Revision history for this message
Dan Scott (denials) wrote :

Ah, one more commit with some happiness on the corruption front - it turned out that as the input from the test case did not have decode_utf8() invoked on it, we ran into the misery that we saw.

 So, my extra commit does the precautionary decode_utf8() call on the input because that's the sane thing to do, and ensures that the regexes know that they're dealing with a Unicode string instead of some random binary string and can behave accordingly. I've restored the order of the entityize() call in this commit as well.