builder slow with lots of elements

Bug #1075622 reported by Anders Hammarquist on 2012-11-06
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Medium
scoder

Bug Description

Python : sys.version_info(major=2, minor=7, micro=3, releaselevel='final', serial=0)
lxml.etree : (2, 3, 2, 0)
libxml used : (2, 8, 0)
libxml compiled : (2, 7, 8)
libxslt used : (1, 1, 26)
libxslt compiled : (1, 1, 26)

builder/ElementMaker uses len(elem) to find out if an element has any child nodes. This will walk the linked list, counting the nodes, which takes O(n), and can add up if you have lots of child nodes.

Attached patch simply tries to add to the last child, and if that gives an IndexError we use the no-child path. This is much quicker, a test I did went from 30 minutes to 3 minutes.

/Anders

Anders Hammarquist (iko-0) wrote :
scoder (scoder) wrote :

Yes, that makes sense. This was originally written for ET where len(elem) is O(1) because it's backed by a Python list.

Thanks!

https://github.com/lxml/lxml/commit/f428ea38f2427313bee3d27abcab8dadda2b5af1

Changed in lxml:
assignee: nobody → Stefan Behnel (scoder)
importance: Undecided → Medium
status: New → Fix Committed
scoder (scoder) wrote :

Released in lxml 3.1.

Changed in lxml:
status: Fix Committed → Fix Released
scoder (scoder) on 2013-04-28
Changed in lxml:
milestone: none → 3.1
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers