[enhacement] Add MegaRAID support to smartctl tests

Bug #1795051 reported by Peter Sabaini
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Medium
Lee Trager

Bug Description

The smartctl-based tests are often failing with raid controllers because those need vendor-specific flags.

It would be great if the smartctl-based tests tried to determine flags for common raid cards (eg. -d megaraid , cciss etc.) before giving up (https://www.smartmontools.org/wiki/Supported_RAID-Controllers)

Related branches

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Thanks for the bug Peter.

I'm gonna target to 2.5 to see if we can fit it in.

Changed in maas:
milestone: none → 2.5.0rc1
status: New → Triaged
summary: - Wishlist: smartctl should try harder with raid controllers
+ [enhacement] smartctl should try harder with raid controllers
tags: added: internal wishlist
Revision history for this message
Lee Trager (ltrager) wrote : Re: [enhacement] smartctl should try harder with raid controllers

smartctl already has logic in it to automatically determine the device type. The -d flag should only be used when you want to override that logic and are sure of the device type. I'm adding smartmontools to this as I don't think MAAS should be overriding smartctl's logic with its own.

Peter, could you detail what hardware is failing the smartctl test?

Changed in maas:
status: Triaged → Incomplete
Revision history for this message
Lee Trager (ltrager) wrote :

smartmontools has a tool to update the drivedb which is used during autodetection of drive types[1]. It seems the Ubuntu version of smartmontools has this command disabled. If a new version of the drivedb.h fixes your issue we may just need to update the smartmontools packages with the latest version. Could you try the following?

# sudo mv /var/lib/smartmontools/drivedb/drivedb.h /var/lib/smartmontools/drivedb/drivedb.h.orig
# sudo wget https://www.smartmontools.org/browser/trunk/smartmontools/drivedb.h -O /var/lib/smartmontools/drivedb/drivedb.h
# sudo smartctl --all $DEVICE

[1] https://www.smartmontools.org/browser/trunk/smartmontools/update-smart-drivedb.8.in

Revision history for this message
James Troup (elmo) wrote : Re: [Bug 1795051] Re: [enhacement] smartctl should try harder with raid controllers

Lee Trager <email address hidden> writes:

> smartctl already has logic in it to automatically determine the
> device type. The -d flag should only be used when you want to
> override that logic and are sure of the device type. I'm adding
> smartmontools to this as I don't think MAAS should be overriding
> smartctl's logic with its own.

So this seems like a wildly over optimistic interpretation of how
smartctl actually works on server hardware. In my experience -d has
always been required for both HP and Dell servers (#1 and #2 in global
server market share).

> Peter, could you detail what hardware is failing the smartctl test?

It was very likely a Dell R740.

--
James

Revision history for this message
Peter Sabaini (peter-sabaini) wrote : Re: [enhacement] smartctl should try harder with raid controllers

To add to the above, this was on a Dell R740 and R430 -- both required -d megaraid,X to run when running smartctl manually.

MAAS version: 2.3.5 (6511-gf466fdb-0ubuntu1~16.04.1) on xenial.

Test output from Maas ftr.:

INFO: Veriying SMART support for the following drive: /dev/sdd

INFO: Running command: sudo -n smartctl --all /dev/sdd

INFO: Unable to run test. The following drive does not support SMART: /dev/sdd

Revision history for this message
Lee Trager (ltrager) wrote :

I noticed that our CI does have a MegaRAID controller which the smartctl-validate test was skipping. I tried updating drivedb.h but that did not fix the issue. I was only able to get smartctl output when run with -d megaraid,9.

@Peter, James - Do you have a way to programmatically determine what SCSI bus number should be used? I only discovered I should use 9 by trying 0-8 first. It seems this is an issue with many RAID controllers, you need to specify the BUS or port number the device uses.

smartctl returns 2 when the SCSI bus number is wrong. According to the man page 2 mean "Device open failed, device did not return an IDENTIFY DEVICE structure, or device is in a low-power mode." We can't brute force the SCSI bus number as we don't know if 2 means the number is wrong or the device is completely dead.

Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

Lee, not sure tbh - maybe correlating bus no. from lspci and the targets from /sys/bus/pci/devices/? Alternatively I believe the megacli tools should output the device id

Changed in maas:
status: Incomplete → New
Revision history for this message
Lee Trager (ltrager) wrote :

I've poked around /sys and /proc and haven't found anything. It seems the only way to do this would be by using the megacli which is not available in the archive.

Are the SCSI bus numbers typically the same across multiple machines? We've discussed adding user specified parameters to commissioning/testing scripts which would allow you specify the device type and SCSI bus number.

If so, you could create a custom smartctl-validate script which has the proper parameters in the meantime.

Lee Trager (ltrager)
no longer affects: smartmontools
Lee Trager (ltrager)
Changed in maas:
importance: Undecided → Medium
status: New → In Progress
assignee: nobody → Lee Trager (ltrager)
Revision history for this message
Lee Trager (ltrager) wrote :

The attached branch adds MegaRAID support to the MAAS smartctl tests. Testing only occurs on MegaRAID controllers if the storcli tool is installed by another script[1]. I'm unable to include storcli installation in master as it is a proprietary tool that you must download from Broadcom.

I was only able to add MegaRAID support as it is the only hardware RAID I currently have access for. For other hardware RAID controllers please open a separate bug.

[1] https://discourse.maas.io/t/running-smart-tests-against-megaraid-controllers/185

summary: - [enhacement] smartctl should try harder with raid controllers
+ [enhacement] Add MegaRAID support to smartctl tests
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
milestone: 2.5.0rc1 → 2.5.0beta3
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.