Activity log for bug #191199

Date Who What changed Old value New value Message
2008-02-12 09:17:26 Gavin Panella bug added bug
2008-02-12 10:41:36 Gavin Panella bug added subscriber Graham Binns
2008-02-27 13:16:48 Diogo Matsubara malone: status New Confirmed
2008-04-16 21:45:34 Diogo Matsubara description On 11 Feb 2008, at 14:17, James Henstridge wrote: > On 11/02/2008, Gavin Panella <gavin.panella@canonical.com> wrote: >> On 9 Feb 2008, at 21:26, Christian Robottom Reis wrote: >>> Because I don't have anything better to do I also looked at the >>> checkwatches failures today. They are not all bad. But there's one >>> which >>> bothered me; elinks.cz fails when pulling information for bug 987. >>> >>> 10:58:23 INFO Updating 1 watches on http:// >>> bugzilla.elinks.cz >>> >>> 10:58:26 ERROR Failed to parse XML description for >>> http://bugzilla.elinks.cz bugs [u'987']: syntax error: line 10, >>> column 62 >>> >>> Now this is failing because somebody decided it would be a good >>> idea to >>> put a non-printable character in a bug comment: >>> >>> http://bugzilla.elinks.cz/show_bug.cgi?id=987#c2 >>> >>> What should our long-term plan be for this sort of situation? Get it >>> fixed upstream? Or replace unprintables when importing comments? Or >>> blacklisting bugs so they stop spamming our logs? >> >> Bugzilla shouldn't create invalid XML, so it should ideally be fixed >> there. See: >> >> https://bugzilla.mozilla.org/show_bug.cgi?id=105960 >> >> But... it's been open since 2001 and has actually been commented on >> by James H as recently as April 2002. Looks like this one is not >> getting fixed, and we should probably try to unfuck the XML from >> Bugzilla ourselves. > > My opinion is that since Bugzilla does not guarantee that it will > produce valid XML, we should not treat said data as XML. > > I'd suggest using the BeautifulSoup.BeautifulStoneSoup class > (BeautifulSoup minus HTML specific tweaks) to do the parsing. This > should give us some data even for invalid pages: > >>>> import urllib2 >>>> from BeautifulSoup import BeautifulStoneSoup >>>> data = urllib2.urlopen( > ... 'http://bugzilla.elinks.cz/xml.cgi?id=987').read() >>>> soup = BeautifulStoneSoup(data) >>>> for comment in soup.findAll('long_desc'): > ... print repr(comment.find('thetext').renderContents()) > ... > 'Patch against elinks-0.11 GIT based on https://bugs.launchpad.net/ > bugs/64590' > 'Created an attachment (id=423)\nTypos and language corrections in > ELinks strings\n' > 'Looks good. Should I credit\x01Malcolm Parsons in the AUTHORS file?' > 'Yes, please credit Malcolm.' > > We still need to work out what to do about character encodings, but > that is necessary anyway: as mentioned in the bug report old Bugzilla > had no concept of character encoding, so old bug data can be > misencoded (one of the sources of invalid XML from bugzilla). > > James. On 11 Feb 2008, at 14:17, James Henstridge wrote: > On 11/02/2008, Gavin Panella <gavin.panella@canonical.com> wrote: >> On 9 Feb 2008, at 21:26, Christian Robottom Reis wrote: >>> Because I don't have anything better to do I also looked at the >>> checkwatches failures today. They are not all bad. But there's one >>> which >>> bothered me; elinks.cz fails when pulling information for bug 987. >>> >>> 10:58:23 INFO Updating 1 watches on http:// >>> bugzilla.elinks.cz >>> >>> 10:58:26 ERROR Failed to parse XML description for >>> http://bugzilla.elinks.cz bugs [u'987']: syntax error: line 10, >>> column 62 >>> >>> Now this is failing because somebody decided it would be a good >>> idea to >>> put a non-printable character in a bug comment: >>> >>> http://bugzilla.elinks.cz/show_bug.cgi?id=987#c2 >>> >>> What should our long-term plan be for this sort of situation? Get it >>> fixed upstream? Or replace unprintables when importing comments? Or >>> blacklisting bugs so they stop spamming our logs? >> >> Bugzilla shouldn't create invalid XML, so it should ideally be fixed >> there. See: >> >> https://bugzilla.mozilla.org/show_bug.cgi?id=105960 >> >> But... it's been open since 2001 and has actually been commented on >> by James H as recently as April 2002. Looks like this one is not >> getting fixed, and we should probably try to unfuck the XML from >> Bugzilla ourselves. > > My opinion is that since Bugzilla does not guarantee that it will > produce valid XML, we should not treat said data as XML. > > I'd suggest using the BeautifulSoup.BeautifulStoneSoup class > (BeautifulSoup minus HTML specific tweaks) to do the parsing. This > should give us some data even for invalid pages: > >>>> import urllib2 >>>> from BeautifulSoup import BeautifulStoneSoup >>>> data = urllib2.urlopen( > ... 'http://bugzilla.elinks.cz/xml.cgi?id=987').read() >>>> soup = BeautifulStoneSoup(data) >>>> for comment in soup.findAll('long_desc'): > ... print repr(comment.find('thetext').renderContents()) > ... > 'Patch against elinks-0.11 GIT based on https://bugs.launchpad.net/ > bugs/64590' > 'Created an attachment (id=423)\nTypos and language corrections in > ELinks strings\n' > 'Looks good. Should I credit\x01Malcolm Parsons in the AUTHORS file?' > 'Yes, please credit Malcolm.' > > We still need to work out what to do about character encodings, but > that is necessary anyway: as mentioned in the bug report old Bugzilla > had no concept of character encoding, so old bug data can be > misencoded (one of the sources of invalid XML from bugzilla). > > James. OOPS-830CCW7 shows the problem and the Exception type is UnparseableBugData
2008-05-06 14:04:21 Graham Binns bug assigned to bugzilla
2008-05-06 14:10:25 Bug Watch Updater bugzilla: status Unknown Confirmed
2008-12-30 02:34:21 Bug Watch Updater bugzilla: status Confirmed Fix Released
2008-12-31 11:29:13 Gavin Panella malone: statusexplanation The root bug has been marked as Fix Released in Bugzilla, so we should revisit this issue, if only to prioritise and schedule it.
2008-12-31 11:29:13 Gavin Panella malone: milestone 2.2.1
2009-02-05 20:47:29 Björn Tillenius malone: statusexplanation The root bug has been marked as Fix Released in Bugzilla, so we should revisit this issue, if only to prioritise and schedule it.
2009-02-05 20:47:29 Björn Tillenius malone: milestone 2.2.1
2010-01-25 10:47:54 Gavin Panella tags bugwatch oops bugwatch oops story-reliable-bug-syncing
2010-09-18 18:28:33 Bug Watch Updater bugzilla: importance Unknown Medium
2010-09-18 18:28:38 Bug Watch Updater bug watch added https://bugzilla-test.mozilla.org/show_bug.cgi?id=384
2010-09-18 18:28:38 Bug Watch Updater bug watch added https://bugzilla-test.mozilla.org/show_bug.cgi?id=267
2010-09-18 18:28:38 Bug Watch Updater bug watch added http://landfill.bugzilla.org/bugzilla-tip/show_bug.cgi?id=5032
2010-09-18 18:28:38 Bug Watch Updater bug watch added https://bugzilla.gnome.org/show_bug.cgi?id=417196
2010-09-18 18:28:38 Bug Watch Updater bug watch added https://bugs.eclipse.org/bugs/show_bug.cgi?id=140108
2010-11-26 06:15:28 Curtis Hovey malone: status Confirmed Triaged
2010-11-26 06:15:33 Curtis Hovey malone: importance Undecided Low
2011-01-12 18:15:06 Robert Collins launchpad: importance Low Critical
2011-03-05 15:27:41 Curtis Hovey description On 11 Feb 2008, at 14:17, James Henstridge wrote: > On 11/02/2008, Gavin Panella <gavin.panella@canonical.com> wrote: >> On 9 Feb 2008, at 21:26, Christian Robottom Reis wrote: >>> Because I don't have anything better to do I also looked at the >>> checkwatches failures today. They are not all bad. But there's one >>> which >>> bothered me; elinks.cz fails when pulling information for bug 987. >>> >>> 10:58:23 INFO Updating 1 watches on http:// >>> bugzilla.elinks.cz >>> >>> 10:58:26 ERROR Failed to parse XML description for >>> http://bugzilla.elinks.cz bugs [u'987']: syntax error: line 10, >>> column 62 >>> >>> Now this is failing because somebody decided it would be a good >>> idea to >>> put a non-printable character in a bug comment: >>> >>> http://bugzilla.elinks.cz/show_bug.cgi?id=987#c2 >>> >>> What should our long-term plan be for this sort of situation? Get it >>> fixed upstream? Or replace unprintables when importing comments? Or >>> blacklisting bugs so they stop spamming our logs? >> >> Bugzilla shouldn't create invalid XML, so it should ideally be fixed >> there. See: >> >> https://bugzilla.mozilla.org/show_bug.cgi?id=105960 >> >> But... it's been open since 2001 and has actually been commented on >> by James H as recently as April 2002. Looks like this one is not >> getting fixed, and we should probably try to unfuck the XML from >> Bugzilla ourselves. > > My opinion is that since Bugzilla does not guarantee that it will > produce valid XML, we should not treat said data as XML. > > I'd suggest using the BeautifulSoup.BeautifulStoneSoup class > (BeautifulSoup minus HTML specific tweaks) to do the parsing. This > should give us some data even for invalid pages: > >>>> import urllib2 >>>> from BeautifulSoup import BeautifulStoneSoup >>>> data = urllib2.urlopen( > ... 'http://bugzilla.elinks.cz/xml.cgi?id=987').read() >>>> soup = BeautifulStoneSoup(data) >>>> for comment in soup.findAll('long_desc'): > ... print repr(comment.find('thetext').renderContents()) > ... > 'Patch against elinks-0.11 GIT based on https://bugs.launchpad.net/ > bugs/64590' > 'Created an attachment (id=423)\nTypos and language corrections in > ELinks strings\n' > 'Looks good. Should I credit\x01Malcolm Parsons in the AUTHORS file?' > 'Yes, please credit Malcolm.' > > We still need to work out what to do about character encodings, but > that is necessary anyway: as mentioned in the bug report old Bugzilla > had no concept of character encoding, so old bug data can be > misencoded (one of the sources of invalid XML from bugzilla). > > James. OOPS-830CCW7 shows the problem and the Exception type is UnparseableBugData On 11 Feb 2008, at 14:17, James Henstridge wrote: > On 11/02/2008, Gavin Panella <gavin.panella@canonical.com> wrote: >> On 9 Feb 2008, at 21:26, Christian Robottom Reis wrote: >>> Because I don't have anything better to do I also looked at the >>> checkwatches failures today. They are not all bad. But there's one >>> which >>> bothered me; elinks.cz fails when pulling information for bug 987. >>> >>> 10:58:23 INFO Updating 1 watches on http:// >>> bugzilla.elinks.cz >>> >>> 10:58:26 ERROR Failed to parse XML description for >>> http://bugzilla.elinks.cz bugs [u'987']: syntax error: line 10, >>> column 62 >>> >>> Now this is failing because somebody decided it would be a good >>> idea to >>> put a non-printable character in a bug comment: >>> >>> http://bugzilla.elinks.cz/show_bug.cgi?id=987#c2 >>> >>> What should our long-term plan be for this sort of situation? Get it >>> fixed upstream? Or replace unprintables when importing comments? Or >>> blacklisting bugs so they stop spamming our logs? >> >> Bugzilla shouldn't create invalid XML, so it should ideally be fixed >> there. See: >> >> https://bugzilla.mozilla.org/show_bug.cgi?id=105960 >> >> But... it's been open since 2001 and has actually been commented on >> by James H as recently as April 2002. Looks like this one is not >> getting fixed, and we should probably try to unfuck the XML from >> Bugzilla ourselves. > > My opinion is that since Bugzilla does not guarantee that it will > produce valid XML, we should not treat said data as XML. > > I'd suggest using the BeautifulSoup.BeautifulStoneSoup class > (BeautifulSoup minus HTML specific tweaks) to do the parsing. This > should give us some data even for invalid pages: > >>>> import urllib2 >>>> from BeautifulSoup import BeautifulStoneSoup >>>> data = urllib2.urlopen( > ... 'http://bugzilla.elinks.cz/xml.cgi?id=987').read() >>>> soup = BeautifulStoneSoup(data) >>>> for comment in soup.findAll('long_desc'): > ... print repr(comment.find('thetext').renderContents()) > ... > 'Patch against elinks-0.11 GIT based on https://bugs.launchpad.net/ > bugs/64590' > 'Created an attachment (id=423)\nTypos and language corrections in > ELinks strings\n' > 'Looks good. Should I credit\x01Malcolm Parsons in the AUTHORS file?' > 'Yes, please credit Malcolm.' > > We still need to work out what to do about character encodings, but > that is necessary anyway: as mentioned in the bug report old Bugzilla > had no concept of character encoding, so old bug data can be > misencoded (one of the sources of invalid XML from bugzilla). > > James. OOPS-830CCW7 shows the problem and the Exception type is UnparseableBugData A newer instance: OOPS-1633CCW184
2011-06-07 21:26:09 Benji York launchpad: assignee Benji York (benji)
2011-06-09 17:14:10 Benji York branch linked lp:~benji/launchpad/bug-191199
2011-06-15 21:20:48 Launchpad QA Bot tags bugwatch lp-bugs oops story-reliable-bug-syncing bugwatch lp-bugs oops qa-needstesting story-reliable-bug-syncing
2011-06-15 21:20:50 Launchpad QA Bot launchpad: status Triaged Fix Committed
2011-06-16 22:40:54 Benji York tags bugwatch lp-bugs oops qa-needstesting story-reliable-bug-syncing bugwatch lp-bugs oops qa-untestable story-reliable-bug-syncing
2011-06-17 02:01:33 William Grant launchpad: status Fix Committed Fix Released