Inconsistent Results During Nested Find Operation
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
Here is a reproduction:
http://
In short, when I used beautiful-soup to parse an svg file (inlined for convenience), it will occasionally drop some of the symbols that I am trying to extract. It always runs correctly the first time the function is called, but subsequent invocations can result in a bit of strangeness. The above link runs the same function six times. The function contains a pair of nested find_all commands and corresponding loops. The first three are all identical calls, but get different results:
(For reference "strike" here is 1:1 with a symbol/character in the svg file being "struck" to the surface)
----
32 unique symbols
44 strikes
----
32 unique symbols
38 strikes
----
32 unique symbols
0 strikes
WEIRD!!!!
Anyway, I discovered via various figiting that this didn't happen if I first transformed it into bytes objects. After that, the call always produces the same results.
Bytesified
----
32 unique symbols
44 strikes
----
32 unique symbols
44 strikes
----
32 unique symbols
44 strikes
The xml seems legit, and passes the simple checks I have run it through. Even it it wasn't I would expect stable output.
I have no idea where to dig on this one. Newest version of bs4 on ubuntu linux-64 via anaconda:
In [2]: bs4.__version__
Out[2]: '4.4.1'
Any thoughts/leads appreciated!
Changed in beautifulsoup: | |
status: | New → Incomplete |
Also, happy thanksgiving!