Non-ascii Unicode characters in messages cause SIP client problems

Bug #1463943 reported by Jason Stephenson
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Medium
Unassigned
SIPServer
Fix Released
Medium
Unassigned

Bug Description

Most SIP vendors expect ASCII output in SIP messages. The SIP standard basically says messages should be in ASCII unless the client and server agree otherwise. Since we generally have no way of know if the server and client do agree otherwise, SIPServer should generally just return valid ASCII characters in response messages.

This specifically came up with one vendor using SIP to look up the transit destinations of copies in delivery. One of our copies had a call number with an accented a in it. The field containing that call number caused the vendor to not be able to parse the response message.

We checked and we have a few hundred call numbers with accented or otherwise special characters in them. Not to mention authors and titles that may contain odd characters.

The branch below makes a modification to Sip.pm's add_field to replace non-ASCII characters in the field with a numeric entity representation, similar to what is done with the | or field delimiter character. These representations may not be correct Unicode, but they seem to work in the absence of a better solution.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

Branch is user/dyrcona/lp1463943_ensure_ascii_output in the working repository:

http://git.evergreen-ils.org/?p=working/SIPServer.git;a=shortlog;h=refs/heads/user/dyrcona/lp1463943_ensure_ascii_output

tags: added: pullrequest
Revision history for this message
Jason Stephenson (jstephenson) wrote :

    To reproduce this bug:

    1) Add an accented or other non-ACII unicode character to the call
    number associated with a copy.

    2) Connect with a SIP client and send an item information message
    (64) to the server.

    3) Your SIP client should do 1 of 3 things:
       a) It will choke on the response.
       b) It will display gibberish for the UTF-8 sequence in the respones.
       c) It will correctly display the call number.

    After applying this patch, your client should handle the same item
    information message response with no trouble. If you inspect the
    call number information, the UTF-8 sequences that potentially caused
    trouble before, should now be replaced by XML-escaped numeric
    entities: © for a copyright symbol.

Revision history for this message
Galen Charlton (gmc) wrote :

Here's an alternative approach from Koha's fork of SIPServer that preserves the option for the SIP clients out there that *can* do something with non-ASCII responses to get them:

http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=9865

Revision history for this message
Jason Stephenson (jstephenson) wrote :

Removing the pullrequest tag in light of IRC conversation.

This "feature" needs to be configurable for those who need to keep UTF-8 compatibility.

Question is: Should the new option go on the institution or the account?

tags: removed: pullrequest
Revision history for this message
Galen Charlton (gmc) wrote :

In the interests of not adding (yet another) small roadblock to reunifying the forks, I mildly prefer making the option be at the account level.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

I am in favor of the approach taken by Koha. I'll work on a new fix going in that direction.

Revision history for this message
Jason Stephenson (jstephenson) wrote :
tags: added: pullrequest
Revision history for this message
Jason Stephenson (jstephenson) wrote :

BTW, there is already a global encoding option, set to ascii by default, that appears to do nothing.

I'm commenting on this bug again today because we had to configure this for one of our libraries today.

We're using this branch in production since this summer with no problems reported.

Revision history for this message
Jeff Godin (jgodin) wrote :

There is an existing institution_config-level setting for encoding.

This may be the "global encoding option [...] that appears to do nothing" in comment #8 above.

It does indeed do something within OpenILS::SIP, and there's probably some conflict resolution to be done between the two approaches.

See commit 7bca4bf4070c06c810ed546e30a9bb5749776f28 in Evergreen for additional context:

http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=7bca4bf

Revision history for this message
Jason Stephenson (jstephenson) wrote :

Yes, that is the setting referred to in comment #8.

Ours is set to ascii, but we still get UTF-8. That is why I said "appears to do nothing."

I prefer the Koha-style approach, because it can be set per account.

If you want to merge the two or come up with something better, I'll be happy to try it.

Jeff Godin (jgodin)
Changed in sipserver:
assignee: nobody → Jeff Godin (jgodin)
Revision history for this message
Jason Boyer (jboyer) wrote :

Since there's already a setting that tries to do something, it would be nice if that would set the default and the per-account option would be the override. That way if all clients could handle UTF-8 or ASCII it's very simple to set up, and only the outliers need be fiddled with.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

Yeah, that's exactly what I was thinking in some of the IRC conversation about this branch. I'll fix it up to do that and hopefully it works with the multiplex mode, too.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

All right. I've rebased the branch to resolve a conflict and added a commit to use either the account or institution encoding setting.

I've not tested this, yet, but will hopefully get that chance soon.

For the record, the proper branch is user/dyrcona/lp1463943_encoded_response

http://git.evergreen-ils.org/?p=working/SIPServer.git;a=shortlog;h=refs/heads/user/dyrcona/lp1463943_encoded_response

Revision history for this message
Jason Stephenson (jstephenson) wrote :

I dropped the pullrequest tag because it looks like we need a more comprehensive solution to encoding issues with SIPServer and Evergreen than provided by the branches I've created so far.

tags: removed: pullrequest
Revision history for this message
Jason Stephenson (jstephenson) wrote :

So, I think I want to tackle this in Evergreen. I'll move the bug over there.

It looks like all we have to do is use the clean_text method more thoroughly, including on autoloaded fields.

summary: - Non-ascii Unicode characters in messages cause client problems
+ Non-ascii Unicode characters in messages cause SIP client problems
Revision history for this message
Jason Stephenson (jstephenson) wrote :

I'm going 'round in circles, but after a brief mention of this in IRC with Jeff Godin, we both seem to think that fixing it in SIPServer and eliminating clean_text in Evergreen's SIP modules is the way to go.

This means people will need to coordinate their Evergreen and SIPServer upgrades.

Perhaps, it is time for SIPServer releases?

Changed in sipserver:
assignee: Jeff Godin (jgodin) → nobody
Changed in evergreen:
status: New → Confirmed
Changed in sipserver:
status: New → Confirmed
Changed in evergreen:
assignee: nobody → Jason Stephenson (jstephenson)
Changed in sipserver:
assignee: nobody → Jason Stephenson (jstephenson)
Revision history for this message
Jason Stephenson (jstephenson) wrote :

My plan for resolving this is to have the OpenILS::SIP modules expect to get UTF-8 from Evergreen, since that is what Evergreen uses internally in most cases. This should remove the need for any encode/decode tricks in OpenILS::SIP.

SIPServer will be modified and documented to expect UTF-8 from the backend. It will then handle encoding output from UTF-8 to what is expected by the configuration.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

So, SIPServer is going to expect to get text in the default Perl encoding, i.e. decode[d] text. It will encode to the appropriate character set in write_msg, before calculating checksums, etc.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

I have rebased the previous branch on this bug, see comment 13, and added some changes. I've tested with titles that have diacritics in them with the UTF-8, ISO-8859-1, and ASCII encodings. It works with the changes for bug 1542495 applied to Evergreen.

tags: added: pullrequest
Changed in sipserver:
assignee: Jason Stephenson (jstephenson) → nobody
Changed in evergreen:
assignee: Jason Stephenson (jstephenson) → nobody
Changed in evergreen:
milestone: none → 2.next
Revision history for this message
Martha Driscoll (mjdriscoll) wrote :

NOBLE has been running this code, along with the Evergreen code in lp1542495, on a production sip server running Evergreen 2.12.2 and Debian Jessie since January 2017. It resolves an issue with our state-wide delivery vendor who queries Evergreen via SIP for destination/owning library information. Their software would throw errors when encountering non-ascii characters.

The code has resolved the issue with the delivery vendor and has not caused any issues with the numerous other sip clients that query our server.

Revision history for this message
Martha Driscoll (mjdriscoll) wrote :
Revision history for this message
Terran McCanna (tmccanna) wrote :

Added signedoff tag for Martha

tags: added: signedoff
Revision history for this message
Galen Charlton (gmc) wrote :

Pushed to master. Thanks, Jason and Martha!

Changed in evergreen:
milestone: 3.next → 3.0-alpha
importance: Undecided → Medium
Changed in sipserver:
importance: Undecided → Medium
Changed in evergreen:
status: Confirmed → Fix Committed
Changed in sipserver:
status: Confirmed → Fix Committed
Changed in evergreen:
status: Fix Committed → Fix Released
Changed in sipserver:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.