builder slow with lots of elements

Bug #1075622 reported by Anders Hammarquist
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Fix Released
Medium
scoder

Bug Description

Python : sys.version_info(major=2, minor=7, micro=3, releaselevel='final', serial=0)
lxml.etree : (2, 3, 2, 0)
libxml used : (2, 8, 0)
libxml compiled : (2, 7, 8)
libxslt used : (1, 1, 26)
libxslt compiled : (1, 1, 26)

builder/ElementMaker uses len(elem) to find out if an element has any child nodes. This will walk the linked list, counting the nodes, which takes O(n), and can add up if you have lots of child nodes.

Attached patch simply tries to add to the last child, and if that gives an IndexError we use the no-child path. This is much quicker, a test I did went from 30 minutes to 3 minutes.

/Anders

Revision history for this message
Anders Hammarquist (iko-0) wrote :
Revision history for this message
scoder (scoder) wrote :

Yes, that makes sense. This was originally written for ET where len(elem) is O(1) because it's backed by a Python list.

Thanks!

https://github.com/lxml/lxml/commit/f428ea38f2427313bee3d27abcab8dadda2b5af1

Changed in lxml:
assignee: nobody → Stefan Behnel (scoder)
importance: Undecided → Medium
status: New → Fix Committed
Revision history for this message
scoder (scoder) wrote :

Released in lxml 3.1.

Changed in lxml:
status: Fix Committed → Fix Released
scoder (scoder)
Changed in lxml:
milestone: none → 3.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.