Comment 4 for bug 1974105

Revision history for this message
Luke P (lkp80877984) wrote : Re: slow performance within Tag.index(child)

"reluctant to ... involve caching"
I can completely understand that. It seems to me that others (including parsers) _can_ manipulate .contents, for example, which would invalidate the cache. I'm usually working in C#, where I think you can be a bit clearer about the supported public interface, and more easily have some hidden, private optimizations! Python is a bit free-er.

I do have a workaround, anyway. I've monkey-patched your library, wrapping and replacing a couple of functions to give the caching. It's not beautiful, but it works.

"inefficient ... calls to insert"
Problem isn't just assert. I have other scripts that, for example, extract many many elements. Or insert many many elements. Or replace many many elements. Each of these functions is slow, because each touch index. With a 15MB file, with 150_000 children in a div, it's very slow to CALCULATE the index every time. I can't share this document, but you could generate such a document? html > body > div, with 150_000 paragraphs. Generate it with bs4 even, to exercise the insert operation!

"test code"
Uh... I have a test that randomly generates a series of operations to manipulate a string so I have a known start and endpoint, then does those operations on a bs4 document, and then assert for the end string equivalent. It can catch some issues (I make sure it breaks when I comment out lines), but not all, to be honest... does that sound at all helpful?

Thanks for your hard work maintaining this library!

Cheers,
Luke