OpenILS::SIP::clean_text() can crash

Bug #1542495 reported by Galen Charlton
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Undecided
Unassigned
SIPServer
Fix Released
Undecided
Unassigned

Bug Description

On Debian Jessie, OpenILS::SIP::clean_text() can crash with "cannot decode string with wide characters" upon attempting to munge a data element with non-ASCII, non-Latin1 characters.

Perl: 5.20.2
Encode module version: 2.60
Evergreen 2.9.1 (and likely 2.8.x and earlier)

Revision history for this message
Galen Charlton (gmc) wrote :

The immediate cause of the breakage appears to be this pull request [1] in Encode 2.53.

[1] https://github.com/dankogai/p5-encode/pull/11

Revision history for this message
Galen Charlton (gmc) wrote :

And a small test case: works on Wheezy (Perl 5.14, Encode 2.47) but not Jessie:

#!/usr/bin/perl

use strict;
use warnings;
use utf8;
use Encode;

my $str = "中华人民共和国";
my $oct = decode_utf8($str);

binmode STDOUT, ':raw';
print $oct, "\n";

description: updated
Jeff Godin (jgodin)
Changed in evergreen:
status: New → Confirmed
Revision history for this message
Jason Stephenson (jstephenson) wrote :

I can confirm this is an issue on Ubuntu 16.04 also.

Checking for is_utf8($str) before attempting the decode doesn't help, because the sample given in comment #2 is utf8.

Neither does switching to decode('utf-8', $str).

So, how are we supposed to fix this?

Revision history for this message
Jason Stephenson (jstephenson) wrote :

Just a couple of minutes with Google and I see it.

It's already UTF-8 so no point in decoding/encoding it again.

I vote we remove clean_text from the Evergreen SIP code and go with a variation on my solution to https://bugs.launchpad.net/sipserver/+bug/1463943

If we handle it all in SIPServer's write_msg then there's only 1 place we need to deal with it.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

One last comment for today:

It looks like decode has been confused for encode in the example provided.

From reading the documentation and some stackoverflow discussions, you'd really want endcode there and not decode.

I still stand by my suggestion of dropping clean_text and handling encoding in SIPServer from comment #4.

Revision history for this message
Jason Boyer (jboyer) wrote :

+1 to dropping clean_text and doing all of the final encoding as close to the copper as possible.

I’ve got 99 problems but encoding ain’t... oh.

Changed in sipserver:
status: New → Confirmed
Revision history for this message
Jason Stephenson (jstephenson) wrote :

So, I've assigned this to myself to work on it and related bug 1463943.

The plan is to remove OpenILS::SIP::clean_text() and do the encoding in SIPServer.

I believe something like clean_text will be moved/implemented there.

Changed in evergreen:
assignee: nobody → Jason Stephenson (jstephenson)
Changed in sipserver:
assignee: nobody → Jason Stephenson (jstephenson)
Revision history for this message
Jason Stephenson (jstephenson) wrote :

My plan for resolving this is to have the OpenILS::SIP modules expect to get UTF-8 from Evergreen, since that is what Evergreen uses internally in most cases. This should remove the need for any encode/decode tricks in OpenILS::SIP.

SIPServer will be modified and documented to expect UTF-8 from the backend. It will then handle encoding output from UTF-8 to what is expected by the configuration.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

Actually, we should be decoding the text when pulling it out of the database and not encoding it until we're outputting it somewhere. All text should stay in the default Perl encoding until Perl is done with it. A thorough fix for that will require an audit of all the Perl code in Evergreen and OpenSRF, most likely.

(I finally started working on this for realz again.)

Revision history for this message
Jason Stephenson (jstephenson) wrote :

The branch at working/user/dyrcona/lp1542495-remove-OpenILS-SIP-clean_text is my attempt at fixing this. It must be applied in concert with the SIPServer branch on bug 1463943.

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/dyrcona/lp1542495-remove-OpenILS-SIP-clean_text

It has had limited testing on Ubuntu 16.04. I plan to test it on Ubuntu 14.04 and possibly Debian Wheezy and Debian Jessie.

I need to add a release note and emphasize the importance of upgrading SIPServer at the same time.

tags: added: needsreleasenote pullrequest
Revision history for this message
Jason Stephenson (jstephenson) wrote :

I added release notes to the branch and altered the sample oils_sip.xml.example configuration. I'm recommending that the encoding element move from implementation_config into the institution element.

I would like another pair of eyes on the release notes, please. I'm not exactly thrilled with them and if anyone can find a simpler way to explain things, I'd welcome any suggestions or edits.

I've tested this on Ubuntu 16.04 and 14.04 with Evergreen master and SIPServer master and the concerto dataset. Everything seems to work. Unfortunately, testing SIP is not something that we're setup to do, yet. I used Bill Erickson's pysip2 library to send some requests to the server both before and after patching. Things that were broken before the patch worked after.

Changed in evergreen:
assignee: Jason Stephenson (jstephenson) → nobody
milestone: none → 2.next
Changed in sipserver:
assignee: Jason Stephenson (jstephenson) → nobody
tags: removed: needsreleasenote
Revision history for this message
Martha Driscoll (mjdriscoll) wrote :

NOBLE has been running this code, along with the SIPServer code in lp1463943, on a production sip server running Evergreen 2.12.2 and Debian Jessie since January 2017. It resolves an issue with our state-wide delivery vendor who queries Evergreen via SIP for destination/owning library information. Their software would throw errors when encountering non-ascii characters.

The code has resolved the issue with the delivery vendor and has not caused any issues with the numerous other sip clients that query our server.

I think the release notes are clear and understandable.

Revision history for this message
Martha Driscoll (mjdriscoll) wrote :
Revision history for this message
Terran McCanna (tmccanna) wrote :

Added signedoff tag for Martha

tags: added: signedoff
Revision history for this message
Galen Charlton (gmc) wrote :

Pushed to master. Thanks, Jason and Martha!

Changed in evergreen:
status: Confirmed → Fix Committed
Changed in sipserver:
status: Confirmed → Fix Committed
Changed in evergreen:
milestone: 3.next → 3.0-alpha
Changed in evergreen:
status: Fix Committed → Fix Released
Changed in sipserver:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.