Comment 7 for bug 1836205

Revision history for this message
Orpheu (marcel854-poire) wrote :

I identified 2 cases for which beautifying the files does not work. For each of these cases, there is a solution by running a regex before beautifying.

1) The case described above:

- a tag "a" or "span", self-closing or followed immediately by the closing tag (without containing text), this is the case of pagination,
- or a self-closing tag "br", located after a closing tag "p" or "div" (and before the next opening tag "p" or "div").
- or this new case :

<body epub:type="bodymatter" lang="fr" xml:lang="fr" id="UGI0-4418" class="calibre">
<div class="calibre1"></div>
<section class="chap" type="chapter"><span epub:type="pagebreak" title="5"></span>
  <div class="calibre2"></div>

In that last case, this is because the pagination is inside the "section" tag and comes immediately after.
Pagination can be specified in an "id" attribute, as shown at the beginning, or in a "title" attribute as in the last example.

Running the following regex allows to beautify well:

https://regex101.com/r/YdWa7o/3

When you have a sequence of "div" tags at the beginning of the xhtml file, as in the attached file, the regex rolls up the span tag after the body tag. Try it on the attached file.

2) There are \n inside a block tag that does not contain a block tag. Replace with spaces all \n of the xhtml file can then beautify well.
Do not replace \n in the nav.xhtml file if there is one.

The following regex tests whether it is useful to replace \n:

https://regex101.com/r/DA1uFt/1
Select dot matches all

If the count of this regex is >0, the following regex must be executed before beautifying:

https://regex101.com/r/br5eV6/2
Select dot matches all

The pattern avoids processing the nav.xhtml file.
If you choose "all text files", you can simply change \n by \x20 in the last regex, because the nav.xhtml file is avoided by this option.

Running these 2 regex (the second under condition) before beautifying allows to correctly treat more cases.