Comment 4 for bug 1610678

Revision history for this message
Jakub Kotrla (0-jakub) wrote :

Hi, I am a developer creating AddedContent plugin loading data from server obalkyknih.cz that provides e.g. toc. I am working with Linda.

The encoding of toc shown in evergreen using our new plugin was wrong (letters with accents were replaced by strange symbols).

I've tried a lot of tricks and investigated a lot, I've found following:
- server obalkyknih.cz provides toc as utf8
- toc shown in evergreen using our plugin is double-encoded, when I tried to encode original toc being in utf8 from "ISO-8859-1" to "UTF-8", I got same results as what evergreen shown
- therefore I suspect somewhere in process is already utf8-encoded toc encoded again (in a way "ISO-8859-1" to "UTF-8")
- I've tried to use Encode and utf8 Perl module
- I've tried to log toc to logger and content of log file is correct, maybe because evergreen Logger calls binmode(SINK, ':utf8'); in sub _write_file
- I've tried to add line binmode(STDOUT, ":utf8"); to module AddedContent.pm with no success
- I've even tried to add encoding to content-type part of returned added content by using following line in our AddedContent handler:
return { content_type => 'text/html; charset=utf-8', content => $c };
- interestingly on URL in form http://evergreen-server/opac/extras/ac/toc/html/r/23225 I could see toc in correct encoding

What I do not understand is where is toc encoded wrongly. How is AddedContent.pm handler called? Who calls it? The only thing I've understood is that AddedContent.pm handler is called from some other part of evergreen via some kind of network call, because the handler in sub print_content writes first line looking like HTTP header (print "Content-type: $ct\n\n";) and than content itself.

I do not know how does evergreen and openSRF work internally but it seems to me, that AddedContent.pm module provides correct toc in correct encoding and some other part of evergreen mess with it and shows toc double-encoded.

Any ideas, help or explanation of who calls AddedContent.pm handler would be greatly appreciated.