Allow selecting decode errors bahaviour

Bug #1318227 reported by Serhiy
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
python-html2text (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Currently it stops convertion on any decode error:

$ html2markdown broken_text
Traceback (most recent call last):
  File "/usr/bin/html2markdown", line 9, in <module>
    load_entry_point('html2text==3.200.3', 'console_scripts', 'html2text')()
  File "/usr/lib/python3/dist-packages/html2text.py", line 781, in main
    data = data.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91 in position 4: invalid start byte

But for the files I'm working on it would be perfectly fine just to add

data = data.decode(encoding, errors='ignore')

It can be exposed as an option.

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: python3-html2text 3.200.3-2
ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9
Uname: Linux 3.13.0-24-generic x86_64
ApportVersion: 2.14.1-0ubuntu3
Architecture: amd64
CurrentDesktop: KDE
Date: Sat May 10 21:02:23 2014
PackageArchitecture: all
SourcePackage: python-html2text
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Serhiy (xintx-ua) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in python-html2text (Ubuntu):
status: New → Confirmed
Revision history for this message
Stefano Rivera (stefanor) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package python-html2text - 2016.1.8-1

---------------
python-html2text (2016.1.8-1) unstable; urgency=medium

  * New upstream release.
    - Long links wrapping controlled by `--no-wrap-links` (Closes: #616090)
    - Includes a --decode_errors option to customize decoding error behaviour
      (LP: #1318227)
  * Update manpage for new options.
  * Use https in the watch file.
  * Patch: python3.5 support

 -- Stefano Rivera <email address hidden> Fri, 15 Jan 2016 13:00:21 -0800

Changed in python-html2text (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.