Comment 221 for bug 445852

Revision history for this message
In , Martin Pitt (pitti) wrote :

I got ssh access to an affected machine and finally tracked this down. I also compared it to the ioctls that smartmontools do.

My raw notes with all the ioctl stracing, bisecting, etc. is in https://bugs.launchpad.net/ubuntu/karmic/+source/libatasmart/+bug/445852/comments/202 .

Summary: It seems READ_THRESHOLDS without READ_DATA (or a different ioctl like RETURN_STATUS) causes this problem, the drive "wants" to send more data which is never flushed. Possible explanation: https://bugzilla.kernel.org/show_bug.cgi?id=14583#c25

http://git.0pointer.de/?p=libatasmart.git;a=commitdiff;h=a223a4f6277a9f006b722b13671d5292dc6339bb fixed this more or less inadvertetly, which explains why we don't see that problem on our development releases.

Quite obviously from the commit, sk_disk_open() called sk_disk_smart_read_thresholds(), but not sk_disk_smart_read_data().
udisks-probe-ata-smart and skdump --can-smart just call sk_disk_open() and sk_disk_smart_is_available() (the latter does not do any I/O itself, just tests a flag).

So while a223a4 fixes this for the "common" use cases, there might still be situations where thresholds are read, but not the values. Let's look where init_smart() (the only place reading thresholds) is called:

 * sk_disk_smart_read_data(): OK, does READ_DATA

 * sk_disk_smart_status(): Does SK_SMART_COMMAND_RETURN_STATUS after init_smart(), confirmed to work

 * sk_disk_smart_self_test(): OK, calls sk_disk_smart_read_data()

So right now, all code paths work.

However, a potential robust solution might be to make init_smart() call sk_disk_smart_read_data() right after sk_disk_smart_read_thresholds(). This would cause data_is_valid to already be TRUE for self_test() (and thus not change behaviour). sk_disk_smart_read_data() could test the flag to avoid reading it twice. For sk_disk_smart_status() this would mean to have an additional unused READ_DATA call, though, but that might not hurt too much.