Comment 1 for bug 2065904

Revision history for this message
Leonard Richardson (leonardr) wrote : Re: Improve copy.copy() runtime

As you discovered in the thread, the problem is that sourceline and sourcepos aren't always set on Tag. When a nonexistent attribute of Tag is accessed, Beautiful Soup treats it as a call to find() and starts looking for a child tag of that name. Here, sourceline and sourcepos just don't have values, because lxml doesn't provide that information (at least the ways we use it). The values should be set to None in the Tag constructor if they're not provided.

As of revision 8900598 in the 4.13 branch, sourceline and sourcepos are always set. Try your benchmark again and see what kind of improvement you see.