Comment 71 for bug 61235

Revision history for this message
Brian K. White (bkw777) wrote :

Well I've been poking at this a bit and have learned a few things.

I don't think the code in the fix_usb_hd.sh above is quite right. using expr to get character 3 out of argv1 which was supplied via the udev rule code "%k" and then using that value as part of the path to a spot under /sys for that device, it's definitely going to fail a lot and I'm actually surprised it ever works except by chance.
The guy who put * in place of that number probably always works, although it's trying to do way more than you want.

I made a script and triggered it from udev as per the latest hints by comparing the model name as well as vendor.
I put all the available %codes on the command line and had the script create a unique temp file by pid and wrote the commandline and the environment to the file like so:

-----
L=/tmp/udev_fix_usb_hd.$$
echo "$@" >$L
date >>$L
set >>$L
-----

That produces lots of interesting info.
The first interesting thing is, it runs 3 times for each time the usb cable is connected.
The contents of the debug files is different each time as the device gets recognized by different layers and goes from initially being a scsi_device, then becomes a scsi_generic, then finally becomes a scsi_disk

here is the output of a sample plug-in event.
watch the values for %k
In this case character 3 of %k was never correct for the way it's used in the script above.
However we can see what is correct, and we don't have to use expr to extract it from something else either.
Each "pass" is the interesting parts of one temp file. All 3 files were part of the same event.
They are shown in the order they happened chronologically. They all happened in the same second, but the pid numbers in the filenames and the $PPID values inside the files seem to bear it out.

The udev rule was 91-fix_usb_hd :
SUBSYSTEMS=="scsi", ATTRS{vendor}=="Seagate", ATTRS{model}=="FreeAgentDesktop",
RUN+="/sbin/fix_usb_hd %%k='%k' %%n='%n' %%p='%p' %%b='%b' %%M='%M' %%m='%m' %%P
='%P' %%r='%r' %%N='%N'"

The script used I already showed above.

I manually broke the %codes into separate lines for easier reading, and the variables shown are just the ones that are in any way interesting or came from udev. stuff like TERM etc.. I left out, but only stuff like that. So when you see a variable in one block and not in another, it's because it really wasn't there.

== pass 1
%k='16:0:0:0'
%n='0'
%p='/class/scsi_device/16:0:0:0'
%b='16:0:0:0'
%M='0'
%m='0'
%P=''
%r='/dev'
%N='/dev/.tmp-0-0'

DEVPATH=/class/scsi_device/16:0:0:0
PHYSDEVBUS=scsi
PHYSDEVDRIVER=sd
PHYSDEVPATH=/devices/pci0000:00/0000:00:1d.7/usb1/1-1/1-1:1.0/host16/target16:0:0/16:0:0:0
SUBSYSTEM=scsi_device
UDEVD_EVENT=1

== pass 2
%k='sg6'
%n='6'
%p='/class/scsi_generic/sg6'
%b='sg6'
%M='21'
%m='6'
%P=''
%r='/dev'
%N='/dev/.tmp-21-6'

DEVNAME=/dev/sg6
DEVPATH=/class/scsi_generic/sg6
MAJOR=21
MINOR=6
PHYSDEVBUS=scsi
PHYSDEVDRIVER=sd
PHYSDEVPATH=/devices/pci0000:00/0000:00:1d.7/usb1/1-1/1-1:1.0/host16/target16:0:0/16:0:0:0
SUBSYSTEM=scsi_generic
UDEVD_EVENT=1

== pass 3
%k='sdh'
%n=''
%p='/block/sdh'
%b='sdh'
%M='8'
%m='112'
%P=''
%r='/dev'
%N='/dev/.tmp-8-112'

DEVLINKS='/dev/disk/by-id/usb-Seagate_FreeAgentDesktop_9QG00QCQ/dev/disk/by-path/pci-0000:00:1d.7-usb-0:1:1.0-scsi-0:0:0:0 /dev/disk/by-uuid/07d6ac11-e314-46a2-8d9e-0ebabb98ef19 /dev/disk/by-label/BACKUPS'
DEVNAME=/dev/sdh
DEVPATH=/block/sdh
ID_BUS=usb
ID_FS_LABEL=BACKUPS
ID_FS_LABEL_SAFE=BACKUPS
ID_FS_TYPE=ext2
ID_FS_USAGE=filesystem
ID_FS_UUID=07d6ac11-e314-46a2-8d9e-0ebabb98ef19
ID_FS_VERSION=1.0
ID_MODEL=FreeAgentDesktop
ID_PATH=pci-0000:00:1d.7-usb-0:1:1.0-scsi-0:0:0:0
ID_REVISION=100D
ID_SERIAL=Seagate_FreeAgentDesktop_9QG00QCQ
ID_TYPE=disk
ID_VENDOR=Seagate
MAJOR=8
MINOR=112
PHYSDEVBUS=scsi
PHYSDEVDRIVER=sd
PHYSDEVPATH=/devices/pci0000:00/0000:00:1d.7/usb1/1-1/1-1:1.0/host16/target16:0:0/16:0:0:0
SUBSYSTEM=block
UDEVD_EVENT=1

Notice, the only time character 3 of %k was even a number, it wasn't the right number for how the script earlier in this thread tries to use it. The right number in this case would be 16. the other two times the script would have tried to use a ":", and an "h".

So, from all that I distill 2 possible scripts that work. Well, one script, two modes of operation, user selectable.
One that sets allow_restart, and allows the drive to go to sleep.
and one that uses sdparm to disable the standby timer, and does not allow the drive to go to sleep.

Although we have that nice 16:0:0:0 value right there in several forms in pass 2, at that point the device is not yet a scsi_disk and so there is no /sys/class/scsi_disk for it yet. it's there practically at the same time, but we can't count on that. So in pass 3 it is still possible to clip it out of another variable by a little shell syntax since it happens to be right at the end of the value with a nice easy character to use for splitting...

slosh:~ # PHYSDEVPATH="/devices/pci0000:00/0000:00:1d.7/usb1/1-1/1-1:1.0/host16/target16:0:0/16:0:0:0"
slosh:~ # echo ${PHYSDEVPATH##*/}
16:0:0:0

So, I think the script should be like this, and this is what I'm using now and I can't make it fail now.
-----
#!/bin/bash -e
# <email address hidden>
# https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/61235
#
# config option, two possible ways to fix drive
# true: allows the drive to spin down when idle, tells kernel to wait gracefully.
# false: drive never goes to sleep.
ALLOW_RESTART=true # true or false

# don't run unless udev ran us
[ "$UDEVD_EVENT" = "1" ] || { echo "This should only be run from udev." ; exit 1 ; }

# don't do anything unless the target is a disk, but don't error either.
[ "$SUSBSYSTEM" = "block" ] || exit 0

$ALLOW_RESTART && {
        echo 1 > /sys/class/scsi_disk/${PHYSDEVPATH##*/}/allow_restart
} || {
        sdparm --clear STANDBY -6 $DEVNAME
}
-----

And the udev rule just runs the script with no %k, no command line args at all.

Maybe the udev rule can be improved even more so that it doesn't match 3 times but just matches once corresponding to the pass 3 above?

Maybe a little more funky shell syntax and we can eliminate the need for a script at all, just have an echo command right there in the udev rule?

I tried this
RUN+="[ x$SUBSYSTEM = xblock ] && echo echo 1 > /sys/class/scsi_disk/${PHYSDEVPATH##*/}/allow_restart"

but syslog just says failed to exec "/lib/udev/["

But at least now we know a better place to put the script instead of /usr/sbin eh?
and have the rule now just say RUN+="fix_usb_hd"
I did that and it's working fine.

This didn't generate any error in syslog, but also didn't work:
RUN+="/usr/bin/test x$SUBSYSTEM = xblock && /bin/echo 1 > /sys/class/scsi_disk/${PHYSDEVPATH##*/}/allow_restart"

It's nice and reliable and automatic and accurate hitting just the device it's supposed to hit, and always hitting it now. In the course of doing all the testing and trial&error the drive letter has jumped all over the place and that number in the path to the allow_restart file has climbed into the 30's and it's working right every time.

Note: if you did the sdparm fix, and the drive no longer goes idle, this is the reverse sdparm command to put the drive back to defaults. power-cycling the drive does not reset it.

slosh:/sbin # sdparm -al /dev/sdc
    /dev/sdc: Seagate FreeAgentDesktop 100D
    Direct access device specific parameters: WP=0 DPOFUA=0
Power condition [po] mode page:
  IDLE 0 [cha: n, def: 0, sav: 0] Idle timer active
  STANDBY 0 [cha: n, def: 1, sav: 0] Standby timer active
  ICT 0 [cha: n, def: 0, sav: 0] Idle condition timer (100 ms)
  SCT 0 [cha: n, def:9000, sav: 0] Standby condition timer (100 ms)

slosh:/sbin # sdparm --defaults --page=po -6 /dev/sdc
    /dev/sdc: Seagate FreeAgentDesktop 100D

slosh:/sbin # sdparm -al /dev/sdc
    /dev/sdc: Seagate FreeAgentDesktop 100D
    Direct access device specific parameters: WP=0 DPOFUA=0
Power condition [po] mode page:
  IDLE 0 [cha: n, def: 0, sav: 0] Idle timer active
  STANDBY 1 [cha: y, def: 1, sav: 1] Standby timer active
  ICT 0 [cha: n, def: 0, sav: 0] Idle condition timer (100 ms)
  SCT 9000 [cha: y, def:9000, sav:9000] Standby condition timer (100 ms)
slosh:/sbin #

or you can change the standby time to something other than 15 minutes:

1 minute, for faster debugging of the restart
slosh:/sbin # sdparm --set SCT=600 -6 /dev/sdc

1 hour: sdparm --set SCT=36000 -6 /dev/sdc

I confirm the drive is really off even though the light is still on, and I get no error accessing it, and it's silent even when running, simpy by picking up and turning the unit back & forth in my hand. When it's running there is gyroscopic resistance to being tilted. When it's off it's like a brick. Yet, it restarts gracefully. No more crashed filesystems.
Also it affects things outside of a filesystem. like, starting with the device fully idle and spun down, and not mounted, I say "mount /backups" and the mount command just pauses while the drive spins up, and 5 seconds later the mount command returns. Wait a minute for it to go idle again, ls /backups, it pauses, then works. No errors anywhere along the way.

Sorry this was long winded, but now we know everything. Except the answer to the poor original posters problem! since he seems to have a completely different issue.

bkw