Bug #1796578 “Regression in 3.32.0: Search/replace not working i...” : Bugs : calibre

Revision history for this message

Kovid Goyal (kovid) wrote on 2018-10-07: Re: calibre bug 1796578

#1

Post the conversion log from the problem conversion (you can get it by
clicking the rotating jobs button in the bottom right corner of the
calibre window)

status incomplete

Changed in calibre:
status:	New → Incomplete

Revision history for this message

Jonas Christian (jonasvp) wrote on 2018-10-08:

#2

Download full text (7.5 KiB)

```
Buch 1 von 1 (The Long Descent: A User's Guide to the End of the Industrial Age) konvertieren
Conversion options changed from defaults:
 cover: u'/tmp/calibre_3.32.0_tmp_2mFCNf/kayB2U.jpeg'
 verbose: 2
 output_profile: 'cybook_opus'
 read_metadata_from_opf: u'/tmp/calibre_3.32.0_tmp_2mFCNf/PFnrNH.opf'
 search_replace: '[["<hr/>\\n<a id=\\"p\\\\d+\\"></a>\\\\d+ \\nThe Long Descent ", ""], ["<hr/>\\n<a id=\\"p\\\\d+\\"></a> \\n[^<]+ \\n\\\\d+ ", ""], ["^(.{60,}?)-? \\\\s+", "\\\\1"]]'
Resolved conversion options
calibre version: 3.32.0
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0.0,
'book_producer': None,
'change_justification': u'original',
'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., '\\s*((chapter|book|section|part)\\s+)|((prolog|prologue|epilogue)(\\s+|$))', 'i')) or @class = 'chapter']",
'chapter_mark': u'pagebreak',
'comments': None,
'cover': u'/tmp/calibre_3.32.0_tmp_2mFCNf/kayB2U.jpeg',
'debug_pipeline': None,
'dehyphenate': True,
'delete_blank_paragraphs': True,
'disable_font_rescaling': False,
'dont_split_on_page_breaks': False,
'duplicate_links_in_toc': False,
'embed_all_fonts': False,
'embed_font_family': None,
'enable_heuristics': False,
'epub_flatten': False,
'epub_inline_toc': False,
'epub_toc_at_end': False,
'epub_version': u'2',
'expand_css': False,
'extra_css': None,
'extract_to': None,
'filter_css': u'',
'fix_indents': True,
'flow_size': 260,
'font_size_mapping': None,
'format_scene_breaks': True,
'html_unwrap_factor': 0.4,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x7fb49af3a8d0>,
'insert_blank_line': False,
'insert_blank_line_size': 0.5,
'insert_metadata': False,
'isbn': None,
'italicize_common_cases': True,
'keep_ligatures': False,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0.0,
'linearize_tables': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'markup_chapter_headings': True,
'max_toc_links': 50,
'minimum_line_height': 120.0,
'new_pdf_engine': False,
'no_chapters_in_toc': False,
'no_default_epub_cover': False,
'no_images': False,
'no_inline_navbars': False,
'no_svg_cover': False,
'output_profile': <calibre.customize.profiles.CybookOpusOutput object at 0x7fb49af3ac50>,
'page_breaks_before': u"//*[name()='h1' or name()='h2']",
'prefer_metadata_cover': False,
'preserve_cover_aspect_ratio': False,
'pretty_print': True,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': u'/tmp/calibre_3.32.0_tmp_2mFCNf/PFnrNH.opf',
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': u'',
'search_replace': '[["<hr/>\\n<a id=\\"p\\\\d+\\"></a>\\\\d+ \\nThe Long Descent ", ""], ["<hr/>\\n<a id=\\"p\\\\d+\\"></a> \\n[^<]+ \\n\\\\d+ ", ""], ["^(.{60,}?)-? \\\\s+", "\\\\1"]]',
'series': None,
'series_index': None,
'smarten_punctuation': False,
'sr1_replace': None,
'sr1...

```
Buch 1 von 1 (The Long Descent: A User's Guide to the End of the Industrial Age) konvertieren
Conversion options changed from defaults:
 cover: u'/tmp/calibre_3.32.0_tmp_2mFCNf/kayB2U.jpeg'
 verbose: 2
 output_profile: 'cybook_opus'
 read_metadata_from_opf: u'/tmp/calibre_3.32.0_tmp_2mFCNf/PFnrNH.opf'
 search_replace: '[["<hr/>\\n<a id=\\"p\\\\d+\\"></a>\\\\d+ \\nThe Long Descent ", ""], ["<hr/>\\n<a id=\\"p\\\\d+\\"></a> \\n[^<]+ \\n\\\\d+ ", ""], ["^(.{60,}?)-? \\\\s+", "\\\\1"]]'
Resolved conversion options
calibre version: 3.32.0
{'asciiize': False,
 'author_sort': None,
 'authors': None,
 'base_font_size': 0.0,
 'book_producer': None,
 'change_justification': u'original',
 'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., '\\s*((chapter|book|section|part)\\s+)|((prolog|prologue|epilogue)(\\s+|$))', 'i')) or @class = 'chapter']",
 'chapter_mark': u'pagebreak',
 'comments': None,
 'cover': u'/tmp/calibre_3.32.0_tmp_2mFCNf/kayB2U.jpeg',
 'debug_pipeline': None,
 'dehyphenate': True,
 'delete_blank_paragraphs': True,
 'disable_font_rescaling': False,
 'dont_split_on_page_breaks': False,
 'duplicate_links_in_toc': False,
 'embed_all_fonts': False,
 'embed_font_family': None,
 'enable_heuristics': False,
 'epub_flatten': False,
 'epub_inline_toc': False,
 'epub_toc_at_end': False,
 'epub_version': u'2',
 'expand_css': False,
 'extra_css': None,
 'extract_to': None,
 'filter_css': u'',
 'fix_indents': True,
 'flow_size': 260,
 'font_size_mapping': None,
 'format_scene_breaks': True,
 'html_unwrap_factor': 0.4,
 'input_encoding': None,
 'input_profile': <calibre.customize.profiles.InputProfile object at 0x7fb49af3a8d0>,
 'insert_blank_line': False,
 'insert_blank_line_size': 0.5,
 'insert_metadata': False,
 'isbn': None,
 'italicize_common_cases': True,
 'keep_ligatures': False,
 'language': None,
 'level1_toc': None,
 'level2_toc': None,
 'level3_toc': None,
 'line_height': 0.0,
 'linearize_tables': False,
 'margin_bottom': 5.0,
 'margin_left': 5.0,
 'margin_right': 5.0,
 'margin_top': 5.0,
 'markup_chapter_headings': True,
 'max_toc_links': 50,
 'minimum_line_height': 120.0,
 'new_pdf_engine': False,
 'no_chapters_in_toc': False,
 'no_default_epub_cover': False,
 'no_images': False,
 'no_inline_navbars': False,
 'no_svg_cover': False,
 'output_profile': <calibre.customize.profiles.CybookOpusOutput object at 0x7fb49af3ac50>,
 'page_breaks_before': u"//*[name()='h1' or name()='h2']",
 'prefer_metadata_cover': False,
 'preserve_cover_aspect_ratio': False,
 'pretty_print': True,
 'pubdate': None,
 'publisher': None,
 'rating': None,
 'read_metadata_from_opf': u'/tmp/calibre_3.32.0_tmp_2mFCNf/PFnrNH.opf',
 'remove_fake_margins': True,
 'remove_first_image': False,
 'remove_paragraph_spacing': False,
 'remove_paragraph_spacing_indent_size': 1.5,
 'renumber_headings': True,
 'replace_scene_breaks': u'',
 'search_replace': '[["<hr/>\\n<a id=\\"p\\\\d+\\"></a>\\\\d+ \\nThe Long Descent ", ""], ["<hr/>\\n<a id=\\"p\\\\d+\\"></a> \\n[^<]+ \\n\\\\d+ ", ""], ["^(.{60,}?)-? \\\\s+", "\\\\1"]]',
 'series': None,
 'series_index': None,
 'smarten_punctuation': False,
 'sr1_replace': None,
 'sr1_search': None,
 'sr2_replace': None,
 'sr2_search': None,
 'sr3_replace': None,
 'sr3_search': None,
 'start_reading_at': None,
 'subset_embedded_fonts': False,
 'tags': None,
 'timestamp': None,
 'title': None,
 'title_sort': None,
 'toc_filter': None,
 'toc_threshold': 6,
 'toc_title': None,
 'transform_css_rules': '[]',
 'unsmarten_punctuation': False,
 'unwrap_factor': 0.45,
 'unwrap_lines': True,
 'use_auto_toc': False,
 'verbose': 2}
InputFormatPlugin: PDF Input running
on /tmp/calibre_3.32.0_tmp_2mFCNf/e_j0sX.pdf
Converting file to html...
pdftohtml log:
Page-1
Page-2
Page-3
Page-4
Page-5
Page-6
Page-7
 link to page 272 link to page 268 link to page 262 link to page 254 link to page 238 link to page 234 link to page 204 link to page 170 link to page 126 link to page 86 link to page 48 link to page 14 Page-8
Page-9
Page-10
Page-11
Page-12
Page-13
Page-14
Page-15
Page-16
Page-17
Page-18
Page-19
Page-20
Page-21
Page-22
Page-23
Page-24
Page-25
Page-26
Page-27
Page-28
Page-29
Page-30
Page-31
Page-32
Page-33
Page-34
Page-35
Page-36
Page-37
Page-38
Page-39
Page-40
Page-41
Page-42
Page-43
Page-44
Page-45
Page-46
Page-47
Page-48
Page-49
Page-50
Page-51
Page-52
Page-53
Page-54
Page-55
Page-56
Page-57
Page-58
Page-59
Page-60
Page-61
Page-62
Page-63
Page-64
Page-65
Page-66
Page-67
Page-68
Page-69
Page-70
Page-71
Page-72
Page-73
Page-74
Page-75
Page-76
Page-77
Page-78
Page-79
Page-80
Page-81
Page-82
Page-83
Page-84
Page-85
Page-86
Page-87
Page-88
Page-89
Page-90
Page-91
Page-92
Page-93
Page-94
Page-95
Page-96
Page-97
Page-98
Page-99
Page-100
Page-101
Page-102
Page-103
Page-104
Page-105
Page-106
Page-107
Page-108
Page-109
Page-110
Page-111
Page-112
Page-113
Page-114
Page-115
Page-116
Page-117
Page-118
Page-119
Page-120
Page-121
Page-122
Page-123
Page-124
Page-125
Page-126
Page-127
Page-128
Page-129
Page-130
Page-131
Page-132
Page-133
Page-134
Page-135
Page-136
Page-137
Page-138
Page-139
Page-140
Page-141
Page-142
Page-143
Page-144
Page-145
Page-146
Page-147
Page-148
Page-149
Page-150
Page-151
Page-152
Page-153
Page-154
Page-155
Page-156
Page-157
Page-158
Page-159
Page-160
Page-161
Page-162
Page-163
Page-164
Page-165
Page-166
Page-167
Page-168
Page-169
Page-170
Page-171
Page-172
Page-173
Page-174
Page-175
Page-176
Page-177
Page-178
Page-179
Page-180
Page-181
Page-182
Page-183
Page-184
Page-185
Page-186
Page-187
Page-188
Page-189
Page-190
Page-191
Page-192
Page-193
Page-194
Page-195
Page-196
Page-197
Page-198
Page-199
Page-200
Page-201
Page-202
Page-203
Page-204
Page-205
Page-206
Page-207
Page-208
Page-209
Page-210
Page-211
Page-212
Page-213
Page-214
Page-215
Page-216
Page-217
Page-218
Page-219
Page-220
Page-221
Page-222
Page-223
Page-224
Page-225
Page-226
Page-227
Page-228
Page-229
Page-230
Page-231
Page-232
Page-233
Page-234
Page-235
Page-236
Page-237
Page-238
Page-239
Page-240
Page-241
Page-242
Page-243
Page-244
Page-245
Page-246
Page-247
Page-248
Page-249
Page-250
Page-251
Page-252
Page-253
Page-254
Page-255
Page-256
Page-257
Page-258
Page-259
Page-260
Page-261
Page-262
Page-263
Page-264
Page-265
Page-266
Page-267
Page-268
Page-269
Page-270
Page-271
Page-272
Page-273
Retrieving document metadata...
Syntax Error: Marked object is wrong type (boolean)
Generating manifest...
Rendering manifest...
Parsing all content...
Parsing index.html ...
Reading TOC from NCX...
Merging user specified metadata...
Detecting structure...
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Found 1870 items of level: p_1
p_1 left margin stats: Counter({u'0': 1870})
p_1 right margin stats: Counter({u'0': 1870})
Cleaning up manifest...
Trimming unused files from manifest...
Creating EPUB Output...
Rescaling image from 901x1350 to 498x747 cover.jpeg
Splitting markup on page breaks and flow limits, if any...
		Splitting on page-break at id=calibre_pb_0
	Looking for large trees in index.html...
	Found large tree #0
		Splitting...
			Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[936]
			Split tree still too large: 410 KB
		Splitting...
			Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[468]
			Committed sub-tree #1 (198 KB)
			Committed sub-tree #2 (212 KB)
			Split tree still too large: 273 KB
		Splitting...
			Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[469]
			Committed sub-tree #3 (211 KB)
			Committed sub-tree #4 (62 KB)
	Split into 5 parts
EPUB output written to /tmp/calibre_3.32.0_tmp_2mFCNf/Tl6r6p.epub
```

Revision history for this message

Kovid Goyal (kovid) wrote on 2018-10-08:

#3

I tried it with a PDF file in my library, works for me. Attach the PDF file demonstrating/reproducing the problem to this bug report. You can do that by clicking the "Add attachment or patch" link at the bottom of the bug's page. If the file you are attaching is copyrighted, mark the bug as private. You can do this by clicking the tiny yellow icon next to "This report contains Public information" in the top right area of the bug's page.

status incomplete

Revision history for this message

LEONARDO TREVISAN LOMBARDI (ltlombardi) wrote on 2018-10-09:

#4

This is happening to me too. Tried 32 bit and 64bit. v3.32. Same situation. PDF to Mobi, conversion with regex search and replace not working. Downgraded to V3.22 and works fine,.

Revision history for this message

Jonas Christian (jonasvp) wrote on 2018-10-28:

#5

Example document where search/replace fails Edit (83.4 KiB, application/pdf)

It doesn't seem to depend on the PDF at all. Attached please find one example of a document where it fails. I tried the most current version (3.33.1) and it still doesn't work, I'm staying on 3.31.0 for now...

Revision history for this message

Kovid Goyal (kovid) wrote on 2018-10-29:

#6

Works for me with that PDF. I tried the search expression:

limiting

and replace

XXX

the word limiting was replaced as expected. Can you also post a search/replace expression that fails with the PDF

Revision history for this message

Jonas Christian (jonasvp) wrote on 2018-10-29:

#7

Ok, this is interesting. I can reproduce that your replacement works, also using regular expressions such as "l.miting". The replacement I'm testing is replacing "^(.{50,}) " with "\1" in order to remove line breaks on long lines. That does not work from 3.32.0 onwards - the line breaks are still there. It works in 3.31.0.

Revision history for this message

Kovid Goyal (kovid) wrote on 2018-10-29:

#8

Does it work in the wizard (click the magic wand icon next to the search
field).

Revision history for this message

Jonas Christian (jonasvp) wrote on 2018-10-29:

#9

Yes, it works in the wizard (all versions).

Revision history for this message

Kovid Goyal (kovid) wrote on 2018-11-13:

#10

I dont see how it could possibly work in the wizard and not in the actual conversion. And I cannot reproduce the failure on my linux system. I'll test on windows as well, when I am on a windows computer.

Revision history for this message

Kovid Goyal (kovid) wrote on 2018-11-16:

#11

I tried it on my windows machine as well with the above file and search and replace expressions, and the line breaks were successfully removed. I'm afraid without some means to reproduce the issue, there is not much I can do, sorry.

Changed in calibre:
status:	Incomplete → Invalid

Revision history for this message

Jonas Christian (jonasvp) wrote on 2018-12-24:

#12

Sorry to keep bothering you about this but the problem persists up until the current version 3.36.0. For what it's worth, I'm on Ubuntu 18.04.

I tried narrowing it down and I think the culprit is trying to match a HTML tag. For instance, matching "limiting" or "l.m.ting" works fine but trying to match "The " (right after the title) works _only_ in the wizard, not when actually converting.

As I said, it worked up until 3.31.0. Was there a change in escaping the angle brackets or something like that?

I'd be very grateful if you could have another look. Let me know if there's anything I can do to help.

Revision history for this message

Kovid Goyal (kovid) wrote on 2018-12-25:

#13

As I said, I cannot replicate the issue. Without some way to replicate the issue, I have no way to help.

Revision history for this message

Jonas Christian (jonasvp) wrote on 2018-12-25:

#14

What exactly have you tried replicating? From your comment above it seems you only tried search/replace on a word, not on a HTML tag.

Also, could you point me to the general area in the code where the conversion happens? I could try having a look myself.

Revision history for this message

Kovid Goyal (kovid) wrote on 2018-12-25:

#15

I tried replicating it with the search expression and the file that was posted. Search replace happens in preprocess.py

Revision history for this message

Mike Bayer (zzzeek) wrote on 2019-08-31:

#16

Hi there -

I'm having exactly the same problem, using 3.36 on Fedora Linux. The wizard successfully finds all of the page numbers I'm looking for of the form "\d+ ", they highlight in yellow etc., run the conversion to epub and the search and replace does nothing at all. all the markup that the wizard claimed would be matched are unaffected.

Revision history for this message

Mike Bayer (zzzeek) wrote on 2019-08-31:

#17

When I run with debug, I can see that what is shown in the wizard as:

2

looks in the file input/index.html as:

2

I tried using the regex \d+\  instead, which I tested in Python 2 to make sure the interpreter matches it, which it does, but this works neither in the wizard or in the output.

is it possible that Calibre's dependencies, like Python version, or other 3rd party library in use, can affect its behavior in this regard such that you're not able to reproduce ?

Revision history for this message

Kovid Goyal (kovid) wrote on 2019-08-31:

#18

Use the official calibre binaries, not the distro calibre package and
you will be fine, it comes with all needed dependencies, ad is actually
up-to-date as well.

calibre

Regression in 3.32.0: Search/replace not working in format conversion

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches