disk/smart test fails on some server hardware

Bug #950686 reported by Brendan Donegan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Checkbox Provider - Base
Won't Fix
Undecided
Unassigned
checkbox-certification
Invalid
Undecided
Unassigned

Bug Description

This test is failing on some servers which I run it on. The error is 'aborted by host' - I don't know if it is serious or not. The full debug output is in the attachment (for the HP Proliant DL320 G5 which the test was run on)

Revision history for this message
Brendan Donegan (brendan-donegan) wrote :
Revision history for this message
Brendan Donegan (brendan-donegan) wrote :

Same thing happens on Precise and Lucid, so this can be fixed in trunk and then backported. It's likely this is just something to do with the hardware that's being tested which is minor.

Revision history for this message
Jeff Lane  (bladernr) wrote : Re: [Bug 950686] Re: disk/smart test fails on some server hardware

On 03/09/2012 07:13 AM, Brendan Donegan wrote:
> Same thing happens on Precise and Lucid, so this can be fixed in trunk
> and then backported. It's likely this is just something to do with the
> hardware that's being tested which is minor.
>

What other systems does this occur on? We should look at those and see.
  Doing a search of the interwebs indicates that this could be a bug in
smartctl, a bug in the OS, a firmware issue on the hard disk itself, a
failing hard disk, or the work of angry gnomes.

Point is, I wonder if there is some commonality here. It doesn't fail
across the board (and doesn't fail at all on any of the hardware I have
here) so I think we can rule out a real test bug.

Most enterprise level hard disks can have their firmware updated. I
know IBM distributes hard drive firmware, so perhaps updating the disk
firmware on this system would be a good place to start (assuming there's
an update available).

As we discovered, the tests DO seem to hang when run manually via
smartctl but not in ever case... maybe about 60% of the time...

That alone leads me to suspect the hard disk itself.

--
Jeff Lane - Hardware Certification Engineer and Test Tools Developer
Ubuntu Ham: W4KDH
Freenode IRC: bladernr or bladernr_
gpg: 1024D/3A14B2DD 8C88 B076 0DD7 B404 1417 C466 4ABD 3635 3A14 B2DD

Changed in checkbox:
status: New → Confirmed
Changed in checkbox-certification:
status: New → Invalid
Revision history for this message
Jeff Lane  (bladernr) wrote :

I don't think this is necessarly a cert issue, nor do I really think it's a checkbox issue at this point. One of the things I've also seen as a possible solution (or at least used as an extra data point) is whether other similar tests fail...

This, however, would require tools from the hard disk manufacturer, for example:

http://www.seagate.com/www/en-us/support/downloads/seatools

for Seagate drives. It would also require (most likely) windows, or a dos environment, and direct physical access to the server.

Zygmunt Krynicki (zyga)
affects: checkbox → plainbox-provider-checkbox
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I'm marking this as WONT FIX but please reopen if this is actively hampering anyone's work. My goal is to limit the number of open bugs to get a better idea as to what is really important.

Remember that you can always escalate bugs by contacting us in #checkbox on freenode (or #cert-infra in the internal IRC) or by responding in bugs directly.

Changed in plainbox-provider-checkbox:
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.