[Enhancement] Add support for Beautiful Soup 4
Bug #1247222 reported by
TomasHnyk
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
calibre |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
Calibre bundles Beautiful Soup 3.
Beautiful Soup 4 has been stable for a year and a half now, according to its website.
I know that I could import v4 manually, but I am writing a recipe that I would like to share with other users and so I do not want to rely on functionality that they might lack.
v4 seems to be an improvement (and both can be used in parallel). I especially like the inser_after and insert_before methods and it can also reliably extract all visible text with the text method (useful for conuting words of an article). These would certainly simplyfy my code a lot.
To post a comment you must log in.
I'm afraid I am not going to add code to the calibre distribution to
support one recipe. Note that most of calibre (apart from the recipe
system, for legacy reasons) has moved to using html5lib and lxml for
html parsing. I suggest you do the same in your recipe. See the builtin
recipes for Caravan magazine or toi or houston chronicle for examples.
In lxml inserting before or after a node is as simple as:
node.getparent. insert( node.getparent( ).index( node), other_node)
and you can use the full power of xpath to select nodes.
status wontfix