export all bugs in ubuntu as a single text file

Bug #231402 reported by Lucas Nussbaum on 2008-05-17
20
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Low
Unassigned

Bug Description

I would very much like to get a list of all bugs in Ubuntu in text format, with the summary information about each of them, in a single text file.

The current output available in the +text interface sounds perfect (you could maybe remove the bug description and the comment, that would reduce the size a lot).
Another way to reduce the size would be to exclude bugs marked as "Fix released" everywhere.

That would allow to add Ubuntu bugs to http://qa.debian.org/developer.php, and to the Debian package tracking system (I'm willing to push to make that happen).

Debian has something similar, see http://qa.debian.org/data/bts2ldap/fullindex

Diogo Matsubara (matsubara) wrote :

Hi Lucas,

is the text format requested to be machine-parseable?

Changed in launchpad:
status: New → Incomplete
Lucas Nussbaum (lucas) wrote :

do you mean "required"? Yes, I would need it to be machine-parseable. But the current format used for the +text interface sounds good.

Changed in malone:
status: Incomplete → New
Diogo Matsubara (matsubara) wrote :

What's the difference to make your script grab all the bug numbers using https://launchpad.net/ubuntu/+bugs-text and then grab the individual bug-text i.e. https://launchpad.net/bugs/$bugid/+text ?

Notice that the +bugs-text interface, the one which return bug numbers only, can also be used with advanced search parameters. So, if you use the advanced search, and change +bugs to +bugs-text, it'll return a page with all the bug numbers for that search. Does that help in any way?

Changed in malone:
status: New → Incomplete

On 19/05/08 at 18:53 -0000, Diogo Matsubara wrote:
> What's the difference to make your script grab all the bug numbers using
> https://launchpad.net/ubuntu/+bugs-text and then grab the individual
> bug-text i.e. https://launchpad.net/bugs/$bugid/+text ?
>
> Notice that the +bugs-text interface, the one which return bug numbers
> only, can also be used with advanced search parameters. So, if you use
> the advanced search, and change +bugs to +bugs-text, it'll return a page
> with all the bug numbers for that search. Does that help in any way?

In this bug report, I'm not interested in using advanced search. I just
want to fetch all the bugs.

In #231076, yes, advanced text search would be useful, but I don't see how I would use it to fetch all the bugs affecting the packages listed on http://qa.debian.org/developer.php?<email address hidden>&uploader=0x023B3F4F, for example.

In fact, if you could add something like
https://launchpad.net/ubuntu/+bugs-fulltext that would display the
content of /$bugid/+text, I think that would be fine for me (assuming
you add the necessary separators between bugs, so it's possible to
separate them in the output).
--
| Lucas Nussbaum
| <email address hidden> http://www.lucas-nussbaum.net/ |
| jabber: <email address hidden> GPG: 1024D/023B3F4F |

Lucas Nussbaum (lucas) on 2008-05-20
Changed in malone:
status: Incomplete → New
Lucas Nussbaum (lucas) wrote :

Just a note: I started to work on a script that uses the text interface to fetch all ubuntu bugs into a DB. Apparently this works well enough, and I'm able to fetch all the bugs in less than 1h.

This can now be done using the webservice api. Lucas, shall we mark this bug WONTFIX?

Lucas Nussbaum (lucas) wrote :

On 28/01/09 at 21:40 -0000, Tom Berger wrote:
> This can now be done using the webservice api.

How?
How can I, using a single HTTP request, fetch the data for say 1000
bugs?
--
| Lucas Nussbaum
| <email address hidden> http://www.lucas-nussbaum.net/ |
| jabber: <email address hidden> GPG: 1024D/023B3F4F |

Why must it be a single HTTP request? With the API you can achieve the same, but you have to iterate over the bugs in batches. That's not very fast, be it's workable and should be OK if you don't do it too often.

Lucas Nussbaum (lucas) wrote :

On 28/01/09 at 23:14 -0000, Tom Berger wrote:
> Why must it be a single HTTP request?

For efficiency?

> With the API you can achieve the
> same, but you have to iterate over the bugs in batches. That's not very
> fast, be it's workable and should be OK if you don't do it too often.

I'm already fetching the bug data using the +text interface. It takes a
few hours every day, and Launchpad had problems handling the load, so I
was asked to reduce the number of concurrent threads.

Anyway, please just leave the bug open, since it's clearly not fixed by
the webservice API.
--
| Lucas Nussbaum
| <email address hidden> http://www.lucas-nussbaum.net/ |
| jabber: <email address hidden> GPG: 1024D/023B3F4F |

Did you try using the API? If yes, what problems did you find? I realise that this would take several hours - that's expected - and I wouldn't expect anyone doing this more than once daily.

There's no problem leaving the bug open, but I want to make sure that we have a solution that works, rather than a feature request that never gets implemented.

Lucas Nussbaum (lucas) wrote :

On 28/01/09 at 23:48 -0000, Tom Berger wrote:
> Did you try using the API? If yes, what problems did you find? I realise
> that this would take several hours - that's expected - and I wouldn't
> expect anyone doing this more than once daily.

Why would it be faster than using +text?
--
| Lucas Nussbaum
| <email address hidden> http://www.lucas-nussbaum.net/ |
| jabber: <email address hidden> GPG: 1024D/023B3F4F |

In all likelihood it won't be faster than +text, but it's sometimes better to try and optimize one general case than many.

Christian Reis (kiko) wrote :

Well, the idea that I had to help Lucas out was to offer an rsyncable directory of +text dumps. What do you guys think of that?

Graham Binns (gmb) wrote :

> Well, the idea that I had to help Lucas out was to offer an rsyncable directory of +text dumps. What do you guys think of that?

+1 from me.

Stuart Bishop (stub) wrote :

I think it is a lot of work getting a directory of rsyncable +text dumps generated and made available for access, and driving people further away from the tool that is supposed to make this sort of usecase possible - the Launchpad APIs. Shouldn't we see what features are missing from the Launchpad APIs that stop them being used to extract the relevant information before doing this sort hack we will have to keep supporting?

Christian Reis (kiko) wrote :

In practice it isn't that much work; a shell script and wget could pull
the data out. It's some IS work but I will talk it over with James.
Making this efficiently available through the API is not a short-term
goal, though sure, knowing what we need to add to them is never a bad
idea.

Lucas Nussbaum (lucas) wrote :

Well, if you turn that into an hack based on wget, I can continue to run wget myself instead ;)

Christian Reis (kiko) wrote :

Well, the point is that the wget would be done inside the DC only when
bugs were changed, and rsyncing them is much more efficient network-wise.

Lucas Nussbaum (lucas) wrote :

OK, let's resurrect this bug.

So, I'd need information about bugs filed in launchpad against Ubuntu packages. The goal is to import this data into UDD <http://udd.debian.org/>, and then export it to various services, like the PTS or DDPO (see http://packages.qa.debian.org/m/mpich2.html , Ubuntu box in the bottom right corner).

I'm currently parsing the text output (which might not be ideal if you don't maintain it ; I'm fine with switching to json). My current DB schema is: http://udd.debian.org/schema/udd.html#public.table.ubuntu-bugs

The thing to aim for is network efficiency. My main problem currently is that getting all the info requires ~80k HTTP requests per day. Aggregating the data (single file, or rsync) would solve that.

Curtis Hovey (sinzui) on 2010-01-23
Changed in malone:
status: New → Triaged
importance: Undecided → Low
tags: added: feature
Robert Collins (lifeless) wrote :

So, the root issue here is that your script is polling rather than depending on events about changed bugs. This is the core inefficiency, not the time it takes to dump all that data.

Your script is O(bugs^2) over time, vs a polling solution which is approximately O(changes).

Changed in launchpad:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers