option to specify desired output file encoding

Bug #1042146 reported by Stefan Eriksson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
irclog2html
Triaged
Wishlist
Unassigned

Bug Description

Hi I'm having issues with charset encoding, the Apache webserver is using iso-8859-1 charset but the filesystem on the server is UTF-8

so my *log.html are encoded with UTF-8 even though I edit irclog2html.py to have UTF-8 headers its not enough to show "åäö" correcly, is there a way to implement that irclog2html.py output iso-8859-15 encoded html files even though the OS is using UTF-8 as default, maybe a new flag or something?

I have "
  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />"

but still have to do this to make it display ok:
iconv -f utf-8 -t iso-8859-15 -c \#channel.2012-08-24.log.html > channel.2012-08-24.log.html

to get it working.

summary: - encode log.html file with utf-8 encoding.
+ encode log.html file with specific encoding unrelated to default locale.
description: updated
description: updated
description: updated
Revision history for this message
Marius Gedminas (mgedmin) wrote : Re: encode log.html file with specific encoding unrelated to default locale.

I feel should mention that the easiest (and most correct) way to fix your issue would be to fix your Apache configuration (AddDefaultCharset UTF-8), because the charset specified in the Content-Type header generated by Apache always overrides whatever <meta charset> you have in the HTML body.

I'm willing to consider a patch to irclog2html that would let the user specify the desired output encoding. The backend already supports this (see the `charset` class attribute of the various style classes), and the code already knows how to handle characters that cannot be represented in the output charset (by encoding them as numerical character references). All that is missing would be a command-line option like --output-charset=UTF-8 that would override formatter.charset if specified. The trickiest bit is probably handling the default charset -- currently every style can have a different default, and my unreasonable attachment to backwards compatibility inclines me to keep it that way.

Incidentally, you can get irclog2html to produce ISO-8859-1 HTML files today by choosing any of the older styles (simplett, tt, simpletable, table) with irclog2html --style stylename.

Changed in irclog2html:
status: New → Triaged
importance: Undecided → Wishlist
summary: - encode log.html file with specific encoding unrelated to default locale.
+ option to specify desired output file encoding
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.