OpenILS::SIP::clean_text() can crash

Bug #1542495 reported by Galen Charlton on 2016-02-05
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Evergreen
Undecided
Unassigned
SIPServer
Undecided
Unassigned

Bug Description

On Debian Jessie, OpenILS::SIP::clean_text() can crash with "cannot decode string with wide characters" upon attempting to munge a data element with non-ASCII, non-Latin1 characters.

Perl: 5.20.2
Encode module version: 2.60
Evergreen 2.9.1 (and likely 2.8.x and earlier)

Galen Charlton (gmc) wrote :

The immediate cause of the breakage appears to be this pull request [1] in Encode 2.53.

[1] https://github.com/dankogai/p5-encode/pull/11

Galen Charlton (gmc) wrote :

And a small test case: works on Wheezy (Perl 5.14, Encode 2.47) but not Jessie:

#!/usr/bin/perl

use strict;
use warnings;
use utf8;
use Encode;

my $str = "中华人民共和国";
my $oct = decode_utf8($str);

binmode STDOUT, ':raw';
print $oct, "\n";

description: updated
Jeff Godin (jgodin) on 2016-03-17
Changed in evergreen:
status: New → Confirmed
Jason Stephenson (jstephenson) wrote :

I can confirm this is an issue on Ubuntu 16.04 also.

Checking for is_utf8($str) before attempting the decode doesn't help, because the sample given in comment #2 is utf8.

Neither does switching to decode('utf-8', $str).

So, how are we supposed to fix this?

Jason Stephenson (jstephenson) wrote :

Just a couple of minutes with Google and I see it.

It's already UTF-8 so no point in decoding/encoding it again.

I vote we remove clean_text from the Evergreen SIP code and go with a variation on my solution to https://bugs.launchpad.net/sipserver/+bug/1463943

If we handle it all in SIPServer's write_msg then there's only 1 place we need to deal with it.

Jason Stephenson (jstephenson) wrote :

One last comment for today:

It looks like decode has been confused for encode in the example provided.

From reading the documentation and some stackoverflow discussions, you'd really want endcode there and not decode.

I still stand by my suggestion of dropping clean_text and handling encoding in SIPServer from comment #4.

Jason Boyer (jboyer) wrote :

+1 to dropping clean_text and doing all of the final encoding as close to the copper as possible.

I’ve got 99 problems but encoding ain’t... oh.

Changed in sipserver:
status: New → Confirmed
Jason Stephenson (jstephenson) wrote :

So, I've assigned this to myself to work on it and related bug 1463943.

The plan is to remove OpenILS::SIP::clean_text() and do the encoding in SIPServer.

I believe something like clean_text will be moved/implemented there.

Changed in evergreen:
assignee: nobody → Jason Stephenson (jstephenson)
Changed in sipserver:
assignee: nobody → Jason Stephenson (jstephenson)
Jason Stephenson (jstephenson) wrote :

My plan for resolving this is to have the OpenILS::SIP modules expect to get UTF-8 from Evergreen, since that is what Evergreen uses internally in most cases. This should remove the need for any encode/decode tricks in OpenILS::SIP.

SIPServer will be modified and documented to expect UTF-8 from the backend. It will then handle encoding output from UTF-8 to what is expected by the configuration.

Jason Stephenson (jstephenson) wrote :

Actually, we should be decoding the text when pulling it out of the database and not encoding it until we're outputting it somewhere. All text should stay in the default Perl encoding until Perl is done with it. A thorough fix for that will require an audit of all the Perl code in Evergreen and OpenSRF, most likely.

(I finally started working on this for realz again.)

Jason Stephenson (jstephenson) wrote :

The branch at working/user/dyrcona/lp1542495-remove-OpenILS-SIP-clean_text is my attempt at fixing this. It must be applied in concert with the SIPServer branch on bug 1463943.

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/dyrcona/lp1542495-remove-OpenILS-SIP-clean_text

It has had limited testing on Ubuntu 16.04. I plan to test it on Ubuntu 14.04 and possibly Debian Wheezy and Debian Jessie.

I need to add a release note and emphasize the importance of upgrading SIPServer at the same time.

tags: added: needsreleasenote pullrequest
Jason Stephenson (jstephenson) wrote :

I added release notes to the branch and altered the sample oils_sip.xml.example configuration. I'm recommending that the encoding element move from implementation_config into the institution element.

I would like another pair of eyes on the release notes, please. I'm not exactly thrilled with them and if anyone can find a simpler way to explain things, I'd welcome any suggestions or edits.

I've tested this on Ubuntu 16.04 and 14.04 with Evergreen master and SIPServer master and the concerto dataset. Everything seems to work. Unfortunately, testing SIP is not something that we're setup to do, yet. I used Bill Erickson's pysip2 library to send some requests to the server both before and after patching. Things that were broken before the patch worked after.

Changed in evergreen:
assignee: Jason Stephenson (jstephenson) → nobody
milestone: none → 2.next
Changed in sipserver:
assignee: Jason Stephenson (jstephenson) → nobody
tags: removed: needsreleasenote
Martha Driscoll (mjdriscoll) wrote :

NOBLE has been running this code, along with the SIPServer code in lp1463943, on a production sip server running Evergreen 2.12.2 and Debian Jessie since January 2017. It resolves an issue with our state-wide delivery vendor who queries Evergreen via SIP for destination/owning library information. Their software would throw errors when encountering non-ascii characters.

The code has resolved the issue with the delivery vendor and has not caused any issues with the numerous other sip clients that query our server.

I think the release notes are clear and understandable.

Terran McCanna (tmccanna) wrote :

Added signedoff tag for Martha

tags: added: signedoff
Galen Charlton (gmc) wrote :

Pushed to master. Thanks, Jason and Martha!

Changed in evergreen:
status: Confirmed → Fix Committed
Changed in sipserver:
status: Confirmed → Fix Committed
Changed in evergreen:
milestone: 3.next → 3.0-alpha
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers