Beautify all files does not work well

Bug #1836205 reported by Orpheu on 2019-07-11
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Undecided
Unassigned

Bug Description

In the editor :

1) The embellishment of the attached file does not work well: the 3rd div after the start is not correctly incremented.

If, in the 2nd div, I delete: <span id="page_0"/>, then the embellishment works well.

It will be the same if we have:

<span id="page_0"></span>
<a id="epub_file_c01_chapter"/>
<a id="epub_file_c01_chapter"></a>

With epub3, we find more and more frequently this kind of tag, and therefore, the embellishment works less and less.

2) Requested improvement

The help says: "Beautify algorithm only beautifies block level tags that contain other block level tags".

If we run: "Remove unused css rules", we first get a screen with a checkbox. It would be very useful, when we launch "Beautify all files", to have first a warning screen with the possibility to check a box to beautify also the block level tags that do not contain other block level tags. This box would not be checked by default.

Orpheu (marcel854-poire) wrote :

This is not worth the effort for me personally. If some one else wants to implement it, I will be happy to supply any needed guidance. If so, re-open the ticket and we can discuss it.

 status wontfix

Changed in calibre:
status: New → Won't Fix
Orpheu (marcel854-poire) on 2019-07-11
description: updated
description: updated
Orpheu (marcel854-poire) wrote :

I identified 2 cases for which beautifying the files does not work. For each of these cases, there is a solution by running a regex before beautifying.

1) The case described above:

- a tag "a" or "span", self-closing or followed immediately by the closing tag (without containing text), this is the case of pagination,
- or a self-closing tag "br", located after a closing tag "p" or "div" (and before the next opening tag "p" or "div").
- or this new case :

<body epub:type="bodymatter" lang="fr" xml:lang="fr" id="UGI0-4418" class="calibre">
<div class="calibre1"></div>
<section class="chap" type="chapter"><span epub:type="pagebreak" title="5"></span>
  <div class="calibre2"></div>

In that last case, this is because the pagination is inside the "section" tag and comes immediately after.
Pagination can be specified in an "id" attribute, as shown at the beginning, or in a "title" attribute as in the last example.

Running the following regex allows to beautify well:

https://regex101.com/r/YdWa7o/3

When you have a sequence of "div" tags at the beginning of the xhtml file, as in the attached file, the regex rolls up the span tag after the body tag. Try it on the attached file.

2) There are \n inside a block tag that does not contain a block tag. Replace with spaces all \n of the xhtml file can then beautify well.
Do not replace \n in the nav.xhtml file if there is one.

The following regex tests whether it is useful to replace \n:

https://regex101.com/r/DA1uFt/1
Select dot matches all

If the count of this regex is >0, the following regex must be executed before beautifying:

https://regex101.com/r/br5eV6/2
Select dot matches all

The pattern avoids processing the nav.xhtml file.
If you choose "all text files", you can simply change \n by \x20 in the last regex, because the nav.xhtml file is avoided by this option.

Running these 2 regex (the second under condition) before beautifying allows to correctly treat more cases.

Kovid Goyal (kovid) wrote :

Implementing it with regexes is not really suitable, as that is not how the editor works. Instead look at polish/pretty.py in the calibre source code, where it is implemented on a parsed tree.

Changed in calibre:
status: Won't Fix → In Progress
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers