HWDB submissions since Lucid are marked as Invalid

Bug #835103 reported by Francis J. Lacoste on 2011-08-26
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Checkbox
Medium
Unassigned
Launchpad itself
Critical
Abel Deuring

Bug Description

It seems that almost all HWDB submissions made since Lucid are not processed successfully. They are all marked as Invalid as can be seen in https://pastebin.canonical.com/51831/.

Related branches

Francis J. Lacoste (flacoste) wrote :

To know the problem we will probably need to get one of these Invalid file and see what error the parser gives. We should also question the fact that these Invalid submissions have been unreported for over a year.

Once the problem is fixed, we probably want to reprocess all of the invalid submissions.

Marc Tardif (cr3) wrote :

First, I would have a look at the XML of some submissions marked as invalid. I have noticed a <udev> tag in some cases, so I would confirm this is the case.

Second, assuming there is indeed a <udev> tag, I'd make sure that the schema validator supports that tag. Then, make sure that the parser actually does something with that tag.

I suspect that might be a likely candidate to explain all the invalid submissions.

Abel Deuring (adeuring) wrote :

I checked a few submissions from maverick:

ERROR:root:Parsing submission test id: Relax NG validation failed.
/tmp/tmpGNCOQ7:5355: element hardware: Relax-NG validity error : Expecting an element sysfs-attributes, got nothing
/tmp/tmpGNCOQ7:5355: element hardware: Relax-NG validity error : Invalid sequence in interleave
/tmp/tmpGNCOQ7:5355: element hardware: Relax-NG validity error : Invalid sequence in interleave
/tmp/tmpGNCOQ7:5355: element hardware: Relax-NG validity error : Element hardware failed to validate content

I suspect that we need to either tweak the data a bit or that we have to "relax" the RelaxNG schema. Shouldn't be that hard.

Changed in launchpad:
assignee: nobody → Abel Deuring (adeuring)
status: Triaged → In Progress
Abel Deuring (adeuring) wrote :

So, the <sysfs-attributes> element is simply missing :(

Not such a huge problem, but bad nevertheless: This data is used only to get the SCI vendor and model name. We can _probably_ find this data for SCSI block devices in the udevadm output, but not for SCSI scanners (and mostly likely also other non-block SCSI devices, like the "robot part" of tape libraries). The following is that part of the output of "udevadm info --export-db" (that's the content of the <udev> node in the submitted data) for an Adaptec 1480A SCSI adapter and Sharp JX250 scanner connected to the adapter:

P: /devices/pci0000:00/0000:00:1e.0/0000:15:00.0/0000:16:00.0/host5
E: UDEV_LOG=3
E: DEVPATH=/devices/pci0000:00/0000:00:1e.0/0000:15:00.0/0000:16:00.0/host5
E: SUBSYSTEM=scsi
E: DEVTYPE=scsi_host

P: /devices/pci0000:00/0000:00:1e.0/0000:15:00.0/0000:16:00.0/host5/scsi_host/host5
E: UDEV_LOG=3
E: DEVPATH=/devices/pci0000:00/0000:00:1e.0/0000:15:00.0/0000:16:00.0/host5/scsi_host/host5
E: SUBSYSTEM=scsi_host

P: /devices/pci0000:00/0000:00:1e.0/0000:15:00.0/0000:16:00.0/host5/spi_host/host5
E: UDEV_LOG=3
E: DEVPATH=/devices/pci0000:00/0000:00:1e.0/0000:15:00.0/0000:16:00.0/host5/spi_host/host5
E: SUBSYSTEM=spi_host

P: /devices/pci0000:00/0000:00:1e.0/0000:15:00.0/0000:16:00.0/host5/target5:0:6
E: UDEV_LOG=3
E: DEVPATH=/devices/pci0000:00/0000:00:1e.0/0000:15:00.0/0000:16:00.0/host5/target5:0:6
E: SUBSYSTEM=scsi
E: DEVTYPE=scsi_target

P: /devices/pci0000:00/0000:00:1e.0/0000:15:00.0/0000:16:00.0/host5/target5:0:6/5:0:6:0
E: UDEV_LOG=3
E: DEVPATH=/devices/pci0000:00/0000:00:1e.0/0000:15:00.0/0000:16:00.0/host5/target5:0:6/5:0:6:0
E: SUBSYSTEM=scsi
E: DEVTYPE=scsi_device
E: MODALIAS=scsi:t-0x06

P: /devices/pci0000:00/0000:00:1e.0/0000:15:00.0/0000:16:00.0/host5/target5:0:6/5:0:6:0/bsg/5:0:6:0
N: bsg/5:0:6:0
S: char/253:2
E: UDEV_LOG=3
E: DEVPATH=/devices/pci0000:00/0000:00:1e.0/0000:15:00.0/0000:16:00.0/host5/target5:0:6/5:0:6:0/bsg/5:0:6:0
E: SUBSYSTEM=bsg
E: DEVNAME=/dev/bsg/5:0:6:0
E: MAJOR=253
E: MINOR=2
E: DEVLINKS=/dev/char/253:2

P: /devices/pci0000:00/0000:00:1e.0/0000:15:00.0/0000:16:00.0/host5/target5:0:6/5:0:6:0/scsi_device/5:0:6:0
E: UDEV_LOG=3
E: DEVPATH=/devices/pci0000:00/0000:00:1e.0/0000:15:00.0/0000:16:00.0/host5/target5:0:6/5:0:6:0/scsi_device/5:0:6:0
E: SUBSYSTEM=scsi_device

P: /devices/pci0000:00/0000:00:1e.0/0000:15:00.0/0000:16:00.0/host5/target5:0:6/5:0:6:0/scsi_generic/sg2
N: sg2
S: char/21:2
E: UDEV_LOG=3
E: DEVPATH=/devices/pci0000:00/0000:00:1e.0/0000:15:00.0/0000:16:00.0/host5/target5:0:6/5:0:6:0/scsi_generic/sg2
E: SUBSYSTEM=scsi_generic
E: DEVNAME=/dev/sg2
E: MAJOR=21
E: MINOR=2
E: DEVLINKS=/dev/char/21:2

No trace of the strings "SHARP" or "JX250"....

Abel Deuring (adeuring) wrote :

Francis J. Lacoste (flacoste) wrote on 2011-08-26:
> We should also question the fact that these Invalid submissions have been unreported for over a year.

- the QA team could check the processing logs on a regular basis. Especially when a new Ubuntu version is released.
- Processing a submission needs ca 1 second, so we could do that during the upload request. If we provide a proper RESTful API, checkbox could get an immediate error response.
- checkbox could do a RelaxNG verification before submitting a report. If we publish the RelaxNG schema used by Launchpad to process submissions via an HTTP URL and if checkbox uses the file from this URL for verification, we can be much more sure than now that checkbox and Launchpad don't have, let's say, "communication problems".
- we should do more QA on the Launchpad side when a new checkbox version is released. (BTW, it there a way to subscribe to the event "new version of package X released"?)

Deryck Hodge (deryck) wrote :

Thanks for taking on this work, Abel! I was glad to see it in progress when I started work today. :)

As for Francis' concerns about this going unreported for a year, I think we could do a couple things:

 * add some integration test between checkbox and the hwdb, so we get
    something that blows up if they get out of sync :)
 * add stats in lpstats for invalid submissions by distro series

We would need the integration test run automatically somewhere and we would need to remember to look at the stats, but both of these would give us some sense of the health of checkbox/hwdb integration, I think.

Francis J. Lacoste (flacoste) wrote :

I reported bug #836730 about having a complete integration test in checkbox.

Bug is about getting OOPS recorded so that it's easier to track. We could also have a NAGIOS check for unexpected burst in invalid submissions.

Francis J. Lacoste (flacoste) wrote :

Bug #836733 is about the missing OOPS.

Abel Deuring (adeuring) wrote :

Marc Tardif (cr3) wrote on 2011-08-26:

> First, I would have a look at the XML of some submissions marked as invalid. I have noticed a <udev> tag in some cases, so I would confirm this is the case.
>
> Second, assuming there is indeed a <udev> tag, I'd make sure that the schema validator supports that tag. Then, make sure that the parser actually does something with that tag.
>
> I suspect that might be a likely candidate to explain all the invalid submissions.

Marc, right, the bug is related to the <udev> node. The node itself is fine, but it does not provide enough data, as already explained. And you are right that the check against the RelaxNG schema failed.

The point of the RelaxNG check is this: consider the RelaxNG schema as a kind of a contract between checkbox and Launchpad. Launchpad has to make some assumptions about the data it can expect from the client: what sort of information does the client provide, as well as how the data is represented.

Let's make sure that a problem like this, which is trivial in its core, does not get unnoticed again for such a long time. Let's try to prvent this at its root: Let us better synchronize changes in checkbox and changes in Launchpad.

Launchpad QA Bot (lpqabot) wrote :
tags: added: qa-needstesting
Changed in launchpad:
status: In Progress → Fix Committed
Abel Deuring (adeuring) on 2011-08-30
tags: added: qa-ok
removed: qa-needstesting

We already tell the client that the submission failed

On 29/08/2011 11:30 PM, "Abel Deuring" <email address hidden> wrote:

Francis J. Lacoste (flacoste) wrote on 2011-08-26:

> We should also question the fact that these Invalid submissions have been
unreported for over a y...
- the QA team could check the processing logs on a regular basis. Especially
when a new Ubuntu version is released.
- Processing a submission needs ca 1 second, so we could do that during the
upload request. If we provide a proper RESTful API, checkbox could get an
immediate error response.
- checkbox could do a RelaxNG verification before submitting a report. If we
publish the RelaxNG schema used by Launchpad to process submissions via an
HTTP URL and if checkbox uses the file from this URL for verification, we
can be much more sure than now that checkbox and Launchpad don't have, let's
say, "communication problems".
- we should do more QA on the Launchpad side when a new checkbox version is
released. (BTW, it there a way to subscribe to the event "new version of
package X released"?)

--
You received this bug notification because you are subscribed to
Launchpad Suite.
https://bugs....

Francis J. Lacoste (flacoste) wrote :

How? The processing is done offline by a cron script. We don't validate the file as soon as it is submitted. Do we email the user that processing his hardward profile failed? He why would he care anyway. That's a problem for the checkbox developer / LP devs to handle anyway. And the problem here is that failing for one user isn't really telling, the important piece of info here is that it was failing for all users.

Robert Collins (lifeless) wrote :

I'm not 100% sure but there are tests that the view has errors set... See
the tests I changed last week

On 31/08/2011 10:10 AM, "Francis J. Lacoste" <email address hidden>
wrote:

How? The processing is done offline by a cron script. We don't validate
the file as soon as it is submitted. Do we email the user that
processing his hardward profile failed? He why would he care anyway.
That's a problem for the checkbox developer / LP devs to handle anyway.
And the problem here is that failing for one user isn't really telling,
the important piece of info here is that it was failing for all users.

--
You received this bug notification because you are subscribed to
Launchpad Suite.
https://bugs....
https://bugs.launchpad.net/checkbox/+bug/835103/+subscriptions

On 31.08.2011 02:01, Robert Collins wrote:
> I'm not 100% sure but there are tests that the view has errors set... See
> the tests I changed last week

The extra HTTP headers in the view? They are just about "bureaucratic
errors" about missing or invalid values for form fields, but not about
the question if the content of the hardware report is useful.

Ah. Well it was those that I squished oopses from :) perhaps we should just
process inline?

On 31/08/2011 5:15 PM, "Abel Deuring" <email address hidden> wrote:

On 31.08.2011 02:01, Robert Collins wrote:
> I'm not 100% sure but there are tests that the view has...
The extra HTTP headers in the view? They are just about "bureaucratic
errors" about missing or invalid values for form fields, but not about
the question if the content of the hardware report is useful.

--
You received this bug notification because you are subscribed to
Launchpad Suite.

https://bugs.launchpad.net/bugs/835103

Title:
HWDB submissions since Lucid are marked as Invalid
...

https://bugs.launchpad.net/checkbox/+bug/835103/+subscriptions

On 31.08.2011 08:22, Robert Collins wrote:
> Ah. Well it was those that I squished oopses from :) perhaps we should just
> process inline?

Agreed. But when we do this, we should also provide a webservice API
method to upload a hardware report. The API, at least when used with
launchpadlib, has a much more reliable way to forward errors to the
client, compared with what checkbox has to do today to figure out if
something went wrong with the upload.

>
> On 31/08/2011 5:15 PM, "Abel Deuring" <email address hidden>
> wrote:
>
> On 31.08.2011 02:01, Robert Collins wrote:
>> I'm not 100% sure but there are tests that the view has...
> The extra HTTP headers in the view? They are just about "bureaucratic
> errors" about missing or invalid values for form fields, but not about
> the question if the content of the hardware report is useful.
>
>

William Grant (wgrant) on 2011-08-31
Changed in launchpad:
status: Fix Committed → Fix Released

It seems this bug was fixable on the Launchpad side, but is there anything that needs to be done Checkbox side here? I see some mention of integration testing and Checkbox doing some validation before submitting reports.

Changed in checkbox:
status: New → Incomplete
importance: Undecided → Medium

On 15.11.2011 23:06, Brendan Donegan wrote:
> It seems this bug was fixable on the Launchpad side, but is there
> anything that needs to be done Checkbox side here? I see some mention of
> integration testing and Checkbox doing some validation before submitting
> reports.
>
> ** Changed in: checkbox
> Status: New => Incomplete
>
> ** Changed in: checkbox
> Importance: Undecided => Medium
>

well, it was "fixable" in the sense that LP now can cope with incomplete
data: The node <sysfs-attributes> was missing in the submitted data,
meaning that we don't store anything about SCSI devices...

checkbox should again include this data in future releases.

Daniel Manrique (roadmr) wrote :

Precise submissions seem to have the <sysfs-attributes> node now. I'll set to Fix Released, but if you see invalid submissions in Launchpad, please let us know to either reopen this report or create a new one.

Thanks!

Changed in checkbox:
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers