Launchpad itself

Bug #191199
Activity log

Activity log for bug #191199

Date	Who	What changed	Old value	New value	Message
2008-02-12 09:17:26	Gavin Panella	bug			added bug
2008-02-12 10:41:36	Gavin Panella	bug			added subscriber Graham Binns
2008-02-27 13:16:48	Diogo Matsubara	malone: status	New	Confirmed
2008-04-16 21:45:34	Diogo Matsubara	description	On 11 Feb 2008, at 14:17, James Henstridge wrote: > On 11/02/2008, Gavin Panella <gavin.panella@canonical.com> wrote: >> On 9 Feb 2008, at 21:26, Christian Robottom Reis wrote: >>> Because I don't have anything better to do I also looked at the >>> checkwatches failures today. They are not all bad. But there's one >>> which >>> bothered me; elinks.cz fails when pulling information for bug 987. >>> >>> 10:58:23 INFO Updating 1 watches on http:// >>> bugzilla.elinks.cz >>> >>> 10:58:26 ERROR Failed to parse XML description for >>> http://bugzilla.elinks.cz bugs [u'987']: syntax error: line 10, >>> column 62 >>> >>> Now this is failing because somebody decided it would be a good >>> idea to >>> put a non-printable character in a bug comment: >>> >>> http://bugzilla.elinks.cz/show_bug.cgi?id=987#c2 >>> >>> What should our long-term plan be for this sort of situation? Get it >>> fixed upstream? Or replace unprintables when importing comments? Or >>> blacklisting bugs so they stop spamming our logs? >> >> Bugzilla shouldn't create invalid XML, so it should ideally be fixed >> there. See: >> >> https://bugzilla.mozilla.org/show_bug.cgi?id=105960 >> >> But... it's been open since 2001 and has actually been commented on >> by James H as recently as April 2002. Looks like this one is not >> getting fixed, and we should probably try to unfuck the XML from >> Bugzilla ourselves. > > My opinion is that since Bugzilla does not guarantee that it will > produce valid XML, we should not treat said data as XML. > > I'd suggest using the BeautifulSoup.BeautifulStoneSoup class > (BeautifulSoup minus HTML specific tweaks) to do the parsing. This > should give us some data even for invalid pages: > >>>> import urllib2 >>>> from BeautifulSoup import BeautifulStoneSoup >>>> data = urllib2.urlopen( > ... 'http://bugzilla.elinks.cz/xml.cgi?id=987').read() >>>> soup = BeautifulStoneSoup(data) >>>> for comment in soup.findAll('long_desc'): > ... print repr(comment.find('thetext').renderContents()) > ... > 'Patch against elinks-0.11 GIT based on https://bugs.launchpad.net/ > bugs/64590' > 'Created an attachment (id=423)\nTypos and language corrections in > ELinks strings\n' > 'Looks good. Should I credit\x01Malcolm Parsons in the AUTHORS file?' > 'Yes, please credit Malcolm.' > > We still need to work out what to do about character encodings, but > that is necessary anyway: as mentioned in the bug report old Bugzilla > had no concept of character encoding, so old bug data can be > misencoded (one of the sources of invalid XML from bugzilla). > > James.	On 11 Feb 2008, at 14:17, James Henstridge wrote: > On 11/02/2008, Gavin Panella <gavin.panella@canonical.com> wrote: >> On 9 Feb 2008, at 21:26, Christian Robottom Reis wrote: >>> Because I don't have anything better to do I also looked at the >>> checkwatches failures today. They are not all bad. But there's one >>> which >>> bothered me; elinks.cz fails when pulling information for bug 987. >>> >>> 10:58:23 INFO Updating 1 watches on http:// >>> bugzilla.elinks.cz >>> >>> 10:58:26 ERROR Failed to parse XML description for >>> http://bugzilla.elinks.cz bugs [u'987']: syntax error: line 10, >>> column 62 >>> >>> Now this is failing because somebody decided it would be a good >>> idea to >>> put a non-printable character in a bug comment: >>> >>> http://bugzilla.elinks.cz/show_bug.cgi?id=987#c2 >>> >>> What should our long-term plan be for this sort of situation? Get it >>> fixed upstream? Or replace unprintables when importing comments? Or >>> blacklisting bugs so they stop spamming our logs? >> >> Bugzilla shouldn't create invalid XML, so it should ideally be fixed >> there. See: >> >> https://bugzilla.mozilla.org/show_bug.cgi?id=105960 >> >> But... it's been open since 2001 and has actually been commented on >> by James H as recently as April 2002. Looks like this one is not >> getting fixed, and we should probably try to unfuck the XML from >> Bugzilla ourselves. > > My opinion is that since Bugzilla does not guarantee that it will > produce valid XML, we should not treat said data as XML. > > I'd suggest using the BeautifulSoup.BeautifulStoneSoup class > (BeautifulSoup minus HTML specific tweaks) to do the parsing. This > should give us some data even for invalid pages: > >>>> import urllib2 >>>> from BeautifulSoup import BeautifulStoneSoup >>>> data = urllib2.urlopen( > ... 'http://bugzilla.elinks.cz/xml.cgi?id=987').read() >>>> soup = BeautifulStoneSoup(data) >>>> for comment in soup.findAll('long_desc'): > ... print repr(comment.find('thetext').renderContents()) > ... > 'Patch against elinks-0.11 GIT based on https://bugs.launchpad.net/ > bugs/64590' > 'Created an attachment (id=423)\nTypos and language corrections in > ELinks strings\n' > 'Looks good. Should I credit\x01Malcolm Parsons in the AUTHORS file?' > 'Yes, please credit Malcolm.' > > We still need to work out what to do about character encodings, but > that is necessary anyway: as mentioned in the bug report old Bugzilla > had no concept of character encoding, so old bug data can be > misencoded (one of the sources of invalid XML from bugzilla). > > James. OOPS-830CCW7 shows the problem and the Exception type is UnparseableBugData
2008-05-06 14:04:21	Graham Binns	bug			assigned to bugzilla
2008-05-06 14:10:25	Bug Watch Updater	bugzilla: status	Unknown	Confirmed
2008-12-30 02:34:21	Bug Watch Updater	bugzilla: status	Confirmed	Fix Released
2008-12-31 11:29:13	Gavin Panella	malone: statusexplanation		The root bug has been marked as Fix Released in Bugzilla, so we should revisit this issue, if only to prioritise and schedule it.
2008-12-31 11:29:13	Gavin Panella	malone: milestone		2.2.1
2009-02-05 20:47:29	Björn Tillenius	malone: statusexplanation	The root bug has been marked as Fix Released in Bugzilla, so we should revisit this issue, if only to prioritise and schedule it.
2009-02-05 20:47:29	Björn Tillenius	malone: milestone	2.2.1
2010-01-25 10:47:54	Gavin Panella	tags	bugwatch oops	bugwatch oops story-reliable-bug-syncing
2010-09-18 18:28:33	Bug Watch Updater	bugzilla: importance	Unknown	Medium
2010-09-18 18:28:38	Bug Watch Updater	bug watch added		https://bugzilla-test.mozilla.org/show_bug.cgi?id=384
2010-09-18 18:28:38	Bug Watch Updater	bug watch added		https://bugzilla-test.mozilla.org/show_bug.cgi?id=267
2010-09-18 18:28:38	Bug Watch Updater	bug watch added		http://landfill.bugzilla.org/bugzilla-tip/show_bug.cgi?id=5032
2010-09-18 18:28:38	Bug Watch Updater	bug watch added		https://bugzilla.gnome.org/show_bug.cgi?id=417196
2010-09-18 18:28:38	Bug Watch Updater	bug watch added		https://bugs.eclipse.org/bugs/show_bug.cgi?id=140108
2010-11-26 06:15:28	Curtis Hovey	malone: status	Confirmed	Triaged
2010-11-26 06:15:33	Curtis Hovey	malone: importance	Undecided	Low
2011-01-12 18:15:06	Robert Collins	launchpad: importance	Low	Critical
2011-03-05 15:27:41	Curtis Hovey	description	On 11 Feb 2008, at 14:17, James Henstridge wrote: > On 11/02/2008, Gavin Panella <gavin.panella@canonical.com> wrote: >> On 9 Feb 2008, at 21:26, Christian Robottom Reis wrote: >>> Because I don't have anything better to do I also looked at the >>> checkwatches failures today. They are not all bad. But there's one >>> which >>> bothered me; elinks.cz fails when pulling information for bug 987. >>> >>> 10:58:23 INFO Updating 1 watches on http:// >>> bugzilla.elinks.cz >>> >>> 10:58:26 ERROR Failed to parse XML description for >>> http://bugzilla.elinks.cz bugs [u'987']: syntax error: line 10, >>> column 62 >>> >>> Now this is failing because somebody decided it would be a good >>> idea to >>> put a non-printable character in a bug comment: >>> >>> http://bugzilla.elinks.cz/show_bug.cgi?id=987#c2 >>> >>> What should our long-term plan be for this sort of situation? Get it >>> fixed upstream? Or replace unprintables when importing comments? Or >>> blacklisting bugs so they stop spamming our logs? >> >> Bugzilla shouldn't create invalid XML, so it should ideally be fixed >> there. See: >> >> https://bugzilla.mozilla.org/show_bug.cgi?id=105960 >> >> But... it's been open since 2001 and has actually been commented on >> by James H as recently as April 2002. Looks like this one is not >> getting fixed, and we should probably try to unfuck the XML from >> Bugzilla ourselves. > > My opinion is that since Bugzilla does not guarantee that it will > produce valid XML, we should not treat said data as XML. > > I'd suggest using the BeautifulSoup.BeautifulStoneSoup class > (BeautifulSoup minus HTML specific tweaks) to do the parsing. This > should give us some data even for invalid pages: > >>>> import urllib2 >>>> from BeautifulSoup import BeautifulStoneSoup >>>> data = urllib2.urlopen( > ... 'http://bugzilla.elinks.cz/xml.cgi?id=987').read() >>>> soup = BeautifulStoneSoup(data) >>>> for comment in soup.findAll('long_desc'): > ... print repr(comment.find('thetext').renderContents()) > ... > 'Patch against elinks-0.11 GIT based on https://bugs.launchpad.net/ > bugs/64590' > 'Created an attachment (id=423)\nTypos and language corrections in > ELinks strings\n' > 'Looks good. Should I credit\x01Malcolm Parsons in the AUTHORS file?' > 'Yes, please credit Malcolm.' > > We still need to work out what to do about character encodings, but > that is necessary anyway: as mentioned in the bug report old Bugzilla > had no concept of character encoding, so old bug data can be > misencoded (one of the sources of invalid XML from bugzilla). > > James. OOPS-830CCW7 shows the problem and the Exception type is UnparseableBugData	On 11 Feb 2008, at 14:17, James Henstridge wrote: > On 11/02/2008, Gavin Panella <gavin.panella@canonical.com> wrote: >> On 9 Feb 2008, at 21:26, Christian Robottom Reis wrote: >>> Because I don't have anything better to do I also looked at the >>> checkwatches failures today. They are not all bad. But there's one >>> which >>> bothered me; elinks.cz fails when pulling information for bug 987. >>> >>> 10:58:23 INFO Updating 1 watches on http:// >>> bugzilla.elinks.cz >>> >>> 10:58:26 ERROR Failed to parse XML description for >>> http://bugzilla.elinks.cz bugs [u'987']: syntax error: line 10, >>> column 62 >>> >>> Now this is failing because somebody decided it would be a good >>> idea to >>> put a non-printable character in a bug comment: >>> >>> http://bugzilla.elinks.cz/show_bug.cgi?id=987#c2 >>> >>> What should our long-term plan be for this sort of situation? Get it >>> fixed upstream? Or replace unprintables when importing comments? Or >>> blacklisting bugs so they stop spamming our logs? >> >> Bugzilla shouldn't create invalid XML, so it should ideally be fixed >> there. See: >> >> https://bugzilla.mozilla.org/show_bug.cgi?id=105960 >> >> But... it's been open since 2001 and has actually been commented on >> by James H as recently as April 2002. Looks like this one is not >> getting fixed, and we should probably try to unfuck the XML from >> Bugzilla ourselves. > > My opinion is that since Bugzilla does not guarantee that it will > produce valid XML, we should not treat said data as XML. > > I'd suggest using the BeautifulSoup.BeautifulStoneSoup class > (BeautifulSoup minus HTML specific tweaks) to do the parsing. This > should give us some data even for invalid pages: > >>>> import urllib2 >>>> from BeautifulSoup import BeautifulStoneSoup >>>> data = urllib2.urlopen( > ... 'http://bugzilla.elinks.cz/xml.cgi?id=987').read() >>>> soup = BeautifulStoneSoup(data) >>>> for comment in soup.findAll('long_desc'): > ... print repr(comment.find('thetext').renderContents()) > ... > 'Patch against elinks-0.11 GIT based on https://bugs.launchpad.net/ > bugs/64590' > 'Created an attachment (id=423)\nTypos and language corrections in > ELinks strings\n' > 'Looks good. Should I credit\x01Malcolm Parsons in the AUTHORS file?' > 'Yes, please credit Malcolm.' > > We still need to work out what to do about character encodings, but > that is necessary anyway: as mentioned in the bug report old Bugzilla > had no concept of character encoding, so old bug data can be > misencoded (one of the sources of invalid XML from bugzilla). > > James. OOPS-830CCW7 shows the problem and the Exception type is UnparseableBugData A newer instance: OOPS-1633CCW184
2011-06-07 21:26:09	Benji York	launchpad: assignee		Benji York (benji)
2011-06-09 17:14:10	Benji York	branch linked		lp:~benji/launchpad/bug-191199
2011-06-15 21:20:48	Launchpad QA Bot	tags	bugwatch lp-bugs oops story-reliable-bug-syncing	bugwatch lp-bugs oops qa-needstesting story-reliable-bug-syncing
2011-06-15 21:20:50	Launchpad QA Bot	launchpad: status	Triaged	Fix Committed
2011-06-16 22:40:54	Benji York	tags	bugwatch lp-bugs oops qa-needstesting story-reliable-bug-syncing	bugwatch lp-bugs oops qa-untestable story-reliable-bug-syncing
2011-06-17 02:01:33	William Grant	launchpad: status	Fix Committed	Fix Released