AddedContent: Invalid ISBN's are sent to Content Cafe as blank string

Bug #1549393 reported by Josh Stompro
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Medium
Unassigned
2.8
Fix Released
Medium
Unassigned
2.9
Fix Released
Medium
Unassigned

Bug Description

EG 2.8.4

Hello, I've been trying to figure out why cover art isn't showing up for one particular title and I think I might have found a bug.

Title: Bridge of Spies
ISBN: 9786316271976
Record ID: 197084
https://egcatalog.larl.org/eg/opac/record/197084?query=Bridge%20of%20spies;qtype=title;fi%3Asearch_format=;locg=1

That record used to have several ISBN's and UPC's, I removed all but one to simplify.

Asking for the cover art based on the ISBN works fine:
http://egcatalog.larl.org/opac/extras/ac/jacket/large/9786316271976

But asking based on the record id fails: (And the memcache entries have been purged before this test).
http://egcatalog.larl.org/opac/extras/ac/jacket/large/r/197084

I did a packet capture to find out what was being sent/received from Content Cafe when the request based on the record ID was made and found that we were sending a blank request to content cafe. The key/ISBN wasn't being entered.

<?xml version="1.0" encoding="utf-8"?>
<ContentCafe xmlns="http://ContentCafe2.btol.com" DateTime="2016-02-24T15:22:35"><RequestItems UserID="XXXXX" Password="YYYY"><RequestItem><Key></Key><Content Type="M" Encoding="HEX">JacketDetail</Content></RequestItem></RequestItems></ContentCafe>

Content Cafe returns an error response of "<Error>Invalid key text element: Invalid length (0)</Error>"

So I created a perl script to emulate what the added content handler does, and it worked like it should right up until I added in the Business::ISBN validation. Then I found out that the ISBN I'm testing with fails the Business::ISBN validation because it doesn't have a valid group code.

What seems to be happening is that the invalid ISBN's get passed through as blank keys to the AddedContent handler, which adds them to the request that gets sent to the provider. So one invalid ISBN will prevent any added content from being returned.

$isbn_str = $isbn_obj->as_string([]) if defined($isbn_obj);
This will return a blank string if the ISBN is invalid.

From
http://git.evergreen-ils.org/?p=Evergreen.git;a=blob;f=Open-ILS/src/perlmods/lib/OpenILS/WWW/AddedContent.pm;h=a1f1767216333a3403cc1a8350c88dda75d7702e;hb=HEAD#l159

I'll do some more testing to see what the packet capture looks like when there are multiple keys.

I also want to figure out if that ISBN is really invalid or if the Business::ISBN::Data information is just out of date.
Josh

Revision history for this message
Josh Stompro (u-launchpad-stompro-org) wrote :

Ok, Now I think I get why this ISBN isn't valid, most DVD's don't qualify for a real ISBN so the ISBN in this case is an unofficial one created by Baker&Taylor for usability and integration purposes. The group code 631 isn't an official number. 978-631-627197-6

Business::ISBN::Data doesn't have anything for 631 which is why it fails the ISBN check.
http://cpansearch.perl.org/src/BDFOY/Business-ISBN-Data-20140910.003/lib/Business/ISBN/Data.pm

I just looked through our catalog and for the first 10 I checked, the DVD's with no cover art are ones we purchased through Baker&Taylor and include a unofficial ISBN number.

Does anyone know if excluding unofficial but well formatted ISBN numbers was a goal of using the Business::ISBN validation?

Thanks
Josh

Revision history for this message
Josh Stompro (u-launchpad-stompro-org) wrote :

Here is a confirmation that one blank key entry in the xml request to Content Cafe will kill the whole request. Here is another DVD title with two entries for the unofficial B&T ISBN, and two UPC codes. The two ISBN's are passed as blank strings.

<?xml version="1.0" encoding="utf-8"?>
<ContentCafe xmlns="http://ContentCafe2.btol.com" DateTime="2016-02-25T16:36:09"><RequestItems UserID="XXXXX" Password="WHYWHY"><RequestItem><Key></Key><Content>TocDetail</Content></RequestItem><RequestItem><Key></Key><Content>TocDetail</Content></RequestItem><RequestItem><Key>025192258992</Key><Content>TocDetail</Content></RequestItem><RequestItem><Key>025192258916</Key><Content>TocDetail</Content></RequestItem></RequestItems></ContentCafe>

HTTP/1.1 200 OK
Cache-Control: private, max-age=0
Content-Type: text/xml; charset=utf-8
Server: Microsoft-IIS/7.0
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
Date: Thu, 25 Feb 2016 16:36:08 GMT
Content-Length: 313

<?xml version="1.0" encoding="utf-8"?>
<ContentCafe xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" DateTime="2016-02-25T11:36:08.1468367-05:00" xmlns="http://ContentCafe2.btol.com">
  <Error>Invalid key text element: Invalid length (0)</Error>
</ContentCafe>

Revision history for this message
Mike Rylander (mrylander) wrote :

The goal of using B::ISBN was to make sure we pass actual ISBNs. I'd tend to toss this at B&T ask they skip emtpy <Key>s (because not every bug is ours to fix) but I suspect they may be restricted by their choice of tools.

I think it would be reasonable to use grep in front of the maps on lines 185-187 to remove empty strings, which would solve the empty-key problem, in the least. But I don't think we should stop validating ISBNs generally.

Revision history for this message
Josh Stompro (u-launchpad-stompro-org) wrote :

Thanks for the feedback Mike.

I just tried your suggestion and did some more testing. I kept saying empty strings, but really the issue is with undef values in the arrays.

I added a pair of greps to remove the undef's from @isbns and @issns and now it works great, tested on our production system.

Content Cafe doesn't support ISSNs, but they have the same behavior of setting invalid numbers as undef, so I think it makes sense to clean them up also.

Working branch coming soon.

Josh

Revision history for this message
Josh Stompro (u-launchpad-stompro-org) wrote :

Working Branch at:
 user/stompro/lp1549393_remove_undef_isbn_added_content

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/stompro/lp1549393_remove_undef_isbn_added_content

If anyone needs Content Cafe credentials to test I would be happy to supply some.

Thanks
Josh

tags: added: pullrequest
Revision history for this message
Ben Shum (bshum) wrote :

Setting an initial review target of 2.next, but as a bug fix, this can likely be pointed at 2.10-rc when that milestone gets created.

Applied fix and tested against concerto bibs (and some LOC imported records) and the stock OpenLibrary provider and nothing blows up (aka, I still see covers). Will try out some other providers later, but on the surface, this change seems fine.

Changed in evergreen:
importance: Undecided → Medium
milestone: none → 2.next
status: New → Triaged
Revision history for this message
Galen Charlton (gmc) wrote :

Agreed that this is should be targeted for 2.10-rc. I mildly disagree with Mike about the level of validation required; I think for the purpose of looking up added content, verifying checksums is sufficient; Evergreen doesn't need to be the ISBN police here. That may well be a discussion for another bug; on the face of it, Josh's patch offers an improvement either way.

Changed in evergreen:
milestone: 2.next → 2.10-rc
Revision history for this message
Galen Charlton (gmc) wrote :

Pushed all the way to rel_2_8. Thanks, Josh!

Changed in evergreen:
status: Triaged → Fix Committed
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.