Use XPath to parse LP-Pages

Bug #93499 reported by Markus Korn on 2007-03-18
4
Affects Status Importance Assigned to Milestone
python-launchpad-bugs
Wishlist
Markus Korn

Bug Description

So far we are using Regular Expressions to parse the LP-Pages. These RegEx are mostly complicated and hard to maintain. The usage of XPath is more intuitive.

The attached patch against bughelper.main r118 provides a implementation of XPath. The Html-code of the LP-Pages is parsed by libxml2.htmlParseDoc

In some cases I was unable to replace the RegEx with a equivalent (simple) XPath-Construction.

This code needs to be tested and reviewed.
Also someone who is more familiar with XPath should review the statements and constructions i have chosen.

Markus

Markus Korn (thekorn) wrote :
Changed in bughelper:
assignee: nobody → thekorn
importance: Undecided → Wishlist
status: Unconfirmed → In Progress
Daniel Holbach (dholbach) wrote :

I pushed and updated and slightly modified patch to https://code.launchpad.net/~bugsquad/bughelper/xpath - let's continue our work together in there.

Markus Korn (thekorn) wrote :

Set "Fix Status" to "Abandoned Attempt" because we should do further development in the xpath-version in python-launchpad-bugs and added python-launchpad-bugs/xpath branch.

Markus

Markus Korn (thekorn) wrote :

Merged into main,

------------------------------------------------------------
revno: 6
committer: Markus Korn <email address hidden>
branch nick: main
timestamp: Thu 2007-04-26 12:42:35 +0200
message:
  use XPath to parse launchpad's HTML-pages instead of regular expressions
    ------------------------------------------------------------
    revno: 4.1.7
    merged: <email address hidden>
    committer: Markus Korn <email address hidden>
    branch nick: xpath
    timestamp: Thu 2007-04-26 12:39:24 +0200
    message:
      fixed issue in BugAttachment; added some more usefull error-messages
    ------------------------------------------------------------
    revno: 4.1.6
    merged: <email address hidden>
    committer: Markus Korn <email address hidden>
    branch nick: xpath
    timestamp: Thu 2007-04-26 10:11:36 +0200
    message:
      adding assert statement to check URL in BugAttachment
    ------------------------------------------------------------
    revno: 4.1.5
    merged: <email address hidden>
    committer: Markus Korn <email address hidden>
    branch nick: xpath
    timestamp: Thu 2007-04-26 09:55:52 +0200
    message:
      fix error in Bug.info, thanks to Daniel Holbach; adding information to assert statements
    ------------------------------------------------------------
    revno: 4.1.4
    merged: <email address hidden>
    committer: Markus Korn <email address hidden>
    branch nick: xpath
    timestamp: Tue 2007-04-24 12:28:49 +0200
    message:
      merged fix for bug 109213
    ------------------------------------------------------------
    revno: 4.1.3
    merged: <email address hidden>
    committer: Markus Korn <email address hidden>
    branch nick: xpath
    timestamp: Tue 2007-04-24 12:14:32 +0200
    message:
      add reporter and proptags property to the Bug object; choosen 'proptags' as name to avoid conflicts with existing tags attribute (has to be renamed later!)
    ------------------------------------------------------------
    revno: 4.1.2
    merged: <email address hidden>
    committer: Markus Korn <email address hidden>
    branch nick: xpath
    timestamp: Tue 2007-04-24 12:10:27 +0200
    message:
      rename hdoc into xmldoc; adding xmldoc attribute to bug-object; adding bugreport attribute to bug-object to get the pure text of a bug-report
    ------------------------------------------------------------
    revno: 4.1.1
    merged: <email address hidden>
    committer: Markus Korn <email address hidden>
    branch nick: xpath
    timestamp: Mon 2007-04-23 11:44:09 +0200
    message:
      use XPath in BugPage and Bug
------------------------------------------------------------

Changed in python-launchpad-bugs:
status: In Progress → Fix Committed
Markus Korn (thekorn) on 2007-05-01
Changed in python-launchpad-bugs:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers