convert do not preserve spaces in <pre>

Bug #1349536 reported by Mauro
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Fix Released
Undecided
Unassigned

Bug Description

ebook-convert recipe do not preserve spaces in nested tag, es

<div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">django.conf.urls</span> <span class="kn">import</span> <span class="n">patterns</span>
<span class="n">urlpatterns</span> <span class="o">=</span> <span class="n">patterns</span><span class="p">(</span><span class="s">''</span><span class="p">,</span>
<span class="p">(</span><span class="s">r'^articles/(\d{4})/$'</span><span class="p">,</span> <span class="s">'news.views.year_archive'</span><span class="p">),</span>
<span class="p">(</span><span class="s">r'^articles/(\d{4})/(\d{2})/$'</span><span class="p">,</span> <span class="s">'news.views.month_archive'</span><span class="p">),</span>
<span class="p">(</span><span class="s">r'^articles/(\d{4})/(\d{2})/(\d+)/$'</span><span class="p">,</span> <span class="s">'news.views.article_detail'</span><span class="p">),</span>
<span class="p">)</span>
</pre></div>

instead of

<div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">django.conf.urls</span> <span class="kn">import</span> <span class="n">patterns</span>

<span class="n">urlpatterns</span> <span class="o">=</span> <span class="n">patterns</span><span class="p">(</span><span class="s">''</span><span class="p">,</span>
    <span class="p">(</span><span class="s">r'^articles/(\d{4})/$'</span><span class="p">,</span> <span class="s">'news.views.year_archive'</span><span class="p">),</span>
    <span class="p">(</span><span class="s">r'^articles/(\d{4})/(\d{2})/$'</span><span class="p">,</span> <span class="s">'news.views.month_archive'</span><span class="p">),</span>
    <span class="p">(</span><span class="s">r'^articles/(\d{4})/(\d{2})/(\d+)/$'</span><span class="p">,</span> <span class="s">'news.views.article_detail'</span><span class="p">),</span>
<span class="p">)</span>
</pre></div>

This break python indentation.

As workaround You can replace embedded BautifoulSpup.py ("3.0.5") with new version ("3.2.1").

I do not know if need some tweaking.

Calibre 1.46 on GNU/Linux Ubuntu 14.04

Revision history for this message
Mauro (gaionim) wrote :
Revision history for this message
Mauro (gaionim) wrote :
Revision history for this message
Kovid Goyal (kovid) wrote : Re: calibre bug 1349536

The version of BS that ships with calibre is heavily modified and cannot
simply be replaced since literally thousands of recipes depend on it.

You can use either preprocess_html_raw() in your recipe to fix up the
html however you like before it is parsed by BeautifulSoup or you can
use the JavascripRecipe class instead of BasicNewsRecipe, which uses
html5lib for parsing.

If you do want to fix BS then a limited patch to fix onlythis issue
against the existing embedded version of BS is welcome.

 status wontfix

Changed in calibre:
status: New → Won't Fix
Revision history for this message
Mauro (gaionim) wrote :

Double check, but this patch works for me.

Revision history for this message
Kovid Goyal (kovid) wrote : Fixed in master

Fixed in branch master. The fix will be in the next release. calibre is usually released every Friday.

 status fixreleased

Changed in calibre:
status: Won't Fix → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.