Comments in CSV-Catalog

Bug #1826654 reported by Dirk Dickertmann on 2019-04-27
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Undecided
Unassigned

Bug Description

Comments in CSV-Catalog
Calibre 3.41.3 (32 bit); Windows 7 (64bit);
(the bugs did not appear with Calibre up to 3.40.1)

 * <i>, </i> is replaced by an underscore instead of an asterisk
   (is this a permanent feature or a bug?)
 * some characters at the start of a line (e.g. + or -)
   are preceeded by an extra backslash
 * long lines are broken by 'Newline' (0x0a)

Test-Example: HTML source of 'Comments' in 'Edit Metadata':
================================
<div><p><b>Short Stories</b><br>
by <i>Anonymous</i></p>
<p>1) Story One<br>
2) Story <i>Two</i><br>
- Story Three<br>
- Story -Four-<br>
+ another Story<br>
+ another +Story+<br>
# another Story<br>
* another Story<br>
&gt; another Story<br>
... more Stories</p></div>
================================

Normal view in Calibre is ok.

Resulting CSV-file:
Field 'Comments' in CSV-Catalog
(Apache Open Office Calc 4.1.6):
========================
**Short Stories** ## <b> ok
by _Anonymous_ ## <i> and </i> replaced by
                        ## underscores (should be *)
1) Story One ## ok
2) Story _Two_ ## <i>, </i> replaced by _
\- Story Three ## Hyphen-minus at start of line
\- Story -Four- ## with extra backslash
\+ another Story ## leading Plus-sign
\+ another +Story+ ## with extra \
# another Story ## some other characters ok
* another Story ## ok
> another Story ## ok
... more Stories ## ok
========================

Katja (katjawy) on 2019-04-27
tags: added: catalogs

These are caused by an update to htm2ltext the library calibre uses for
this purpose. The update changed the default behavior of html2text.
Most of these will be fixed in the next calibre release, where it will
use options to get html2text to behave like it used to. The exception is
the extra backslash before +-.

That appears to be an intended behavior since thos characters are list
markers in markdown, see https://github.com/Alir3z4/html2text/issues/97

Fixed in branch master. The fix will be in the next release. calibre is usually released every alternate Friday.

 status fixreleased

Changed in calibre:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.